Model Training (image data)

This tutorial focuses on training a deep learning model against image data using an advanced privacy-preserving learning algorithm named Blind Learning. Blind Learning is one of our innovative algorithms which enables training advanced models without compromising the privacy of the data or the intellectual property of the model. Using Blind Learning, you can efficiently train your model on distributed datasets without ever "seeing" the data.

As a deep learning practitioner, you still have complete control over all phases of the model-training lifecycle, including:

  • Data collection and preparation
  • Model creation (architecture and hyperparameters)
  • Model validation
  • Inference


This tutorial trains a neural network to classify hand-written digits using the famous MNIST dataset. However, for illustration we will pretend that you do not own such a dataset and it is not freely available online. In practice, this is true for many real-world sensitive datasets, such as financial and healthcare data, or other highly-valued datasets such as customer behavior. We will use a copy of the MNIST dataset shared by another organization.

You will play the role of a deep learning practitioner who wishes to build your own classifier using this dataset. You will be able to build and train an image classifying model maintaining total privacy of data. The training will also preserve the intellectual property of your model's parameters and architecture.

Data Discovery

You can find the datasets available for data analysis and training using the 🔗Assets or directly from the code via the TripleBlind SDK. Please refer to the Access Points & Assets tutorial to learn more about finding and accessing assets.

You will use the web interface to search for the MNIST dataset and retrieve its UUID.

  • Visit the 🔗Gallery page
  • Search for "TUTORIAL - MNIST"
  • Open the dataset Details page and copy its ID
  • Paste the ID in the code below

The Dataset Details page also includes important information about the dataset, such as the number of images, size, number of channels, etc. as seen in the screenshot below.

import warnings
import torch
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

import tripleblind as tb


#### access the dataset using its UUID
dataset = tb.Asset("27e6e5e6-c281-425e-8a82-06e3d0c8dcc9")

if not dataset.is_valid:
print("Wrong UUID, or dataset not found")
print(f"Successfully loaded: {}")

Data processing

TripleBlind's SDK provides a wide range of functions that allow you to preprocess data for any advanced data analysis task, including training deep neural networks.

In the following, we will use the ImagePreprocessor to apply a set of data transformations to prepare for our training task. For this simple example you can apply the following:

  • Create a preprocessor
  • Define the target_column. This is specific to our API, which lists the classes of all images in a CSV file.
  • Resize all images to 28x28.
  • Convert all images to grayscale
  • Set images dimensions to channels_first, i.e., (C, H, W)
  • Set the dtype of the numpy arrays representing the images to float32

ℹī¸If your training task uses distributed datasets, you do not have to define a separate preprocessor for each dataset. All data processing will apply to all datasets automatically -- assuming the distributed datasets are horizontally distributed (i.e., they are of the same type and features).

pb = tb.preprocessor.image.ImagePreprocessor.builder()
pb.resize(28, 28)
pb.convert("L") # use grayscale

Network definition

The SDK provides a NetworkBuilder class which supports assembling a network using simple methods to add layers and configurations, similar to PyTorch. After creating a NetworkBuilder you can use your editor's automatic completion function to find all supported layers or see the reference documentation for the full list of supported layers.

Notice in the network architecture below the command to add a special layer, builder.add_split(). The Split layer is specific to TripleBlind's Blind Learning algorithm. It enables training distributed deep learning models without having to share the datasets with the model creator. Instead, the model is split into two parts and distributed among data holders and the model creator. The Split layer ensures that only model parameters from a single layer are exchanged during the training process -- ensuring the privacy of the involved datasets.

⚠ī¸Placing the Split layer too early in the network will reduce the number of layers located at the data-owner side while increasing the computational needs at the server-side. We recommend placing the Split layer after the flatten_layer to balance both the privacy and the computational burdens on both the data owner and the algorithm provider.

#### Define the neural network we want to use for training
training_model_name = "example-mnist-network-trainer"

builder = tb.NetworkBuilder()
builder.add_conv2d_layer(1, 32, 3, 1)
builder.add_max_pool2d_layer(2, 2)
builder.add_conv2d_layer(32, 64, 3, 1)
builder.add_max_pool2d_layer(2, 2)


builder.add_dense_layer(1600, 128),
builder.add_dense_layer(128, 10