Model Training (image data)

This tutorial focuses on training a deep learning model against image data using an advanced privacy-preserving learning algorithm named Blind Learning. Blind Learning is one of our innovative algorithms which enables training advanced models without compromising the privacy of the data or the intellectual property of the model. Using Blind Learning, you can efficiently train your model on distributed datasets without ever "seeing" the data.

As a deep learning practitioner, you still have complete control over all phases of the model-training lifecycle, including:

  • Data collection and preparation
  • Model creation (architecture and hyperparameters)
  • Model validation
  • Inference

Scenario

This tutorial trains a neural network to classify hand-written digits using the famous MNIST dataset. However, for illustration we will pretend that you do not own such a dataset and it is not freely available online. In practice, this is true for many real-world sensitive datasets, such as financial and healthcare data, or other highly-valued datasets such as customer behavior. We will use a copy of the MNIST dataset shared by another organization.

You will play the role of a deep learning practitioner who wishes to build your own classifier using this dataset. You will be able to build and train an image classifying model maintaining total privacy of data. The training will also preserve the intellectual property of your model's parameters and architecture.

Data Discovery

You can find the datasets available for data analysis and training using the 🔗Assets or directly from the code via the TripleBlind SDK. Please refer to the Access Points & Assets tutorial to learn more about finding and accessing assets.

You will use the web interface to search for the MNIST dataset and retrieve its UUID.

  • Visit the 🔗Gallery page
  • Search for "TUTORIAL - MNIST"
  • Open the dataset Details page and copy its ID
  • Paste the ID in the code below

The Dataset Details page also includes important information about the dataset, such as the number of images, size, number of channels, etc. as seen in the screenshot below.