TripleBlind User Guide

Privacy Assurances and Risk

TripleBlind offers a suite of capabilities to service many different privacy needs. Privacy assurances are dependent on many factors, including the actual content of Datasets, and it is misleading to state that risk is zero, even when operating blindly. This section explains the risk associated with each capability offered.

Risk Levels

To simplify understanding of risk, the following icons are used to designate risk level.

Safest

Provides HIPAA-level data privacy protection and meets GDPR standards for data privacy protection. For more information, reference the external expert opinion, Privacy Analysis of TripleBlind’s Technology.

Safe

Provides high privacy protection with virtually no risk of inadvertent disclosure, but there are possible “edge cases” that prevent full assurance of privacy.

Safe with Care

Provides high privacy protection when set up and governed with proper procedure. Incorrect usage or procedure could result in a privacy leak.

Risk Summary

TripleBlind’s capabilities can be grouped into three primary categories: Machine Learning, Data Analysis, and Data Operations. The following table summarizes the risk level associated with each capability available in each category.


Capability Safest Safe Safe with Care
Machine Learning
Training

Inference

Data Analysis
Blind Sample

Outlier Detection

Exploratory Data Analysis (EDA)

Data Operations
Sentiment Analysis

Blind Match

Blind Report

Blind Stats

Blind String Search

Blind Join

Blind Query






Access Point Configuration

By default, a TripleBlind Access Point is configured to enable only the capabilities that are identified as Safe and Safest. If you want access to the Safe with Care capabilities, and understand what is required to use them in a regulatory compliant manner, they can be enabled on your Access Point.

Machine Learning

Training

TripleBlind supports a large number of model types and training methods. This list includes:

  • Blind Learning (for shallow and deep learning networks)
  • Distributed Blind Learning (conventional machine learning; e.g., logistic regression)
  • Federated Learning (for neural networks)
  • Region of Interest Training
  • Random Forest Training
  • XGBoost Training
  • Recommender System Training (using both neural networks and conventional methods)
  • BERT Training
  • Regression

All of these methods are designed to provide the highest level of privacy protection. The techniques used in generating these models reveal no raw data outside of the Dataset Owner’s Organization.

Inference

The TripleBlind inference toolset supports a wide range of Algorithms, from basic statistical queries to deep neural networks. The inferences made with the toolset preserve the privacy of both the used model and the data. TripleBlind uses state-of-the-art, mathematically-backed methods to guarantee the privacy of all involved parties. Additionally, the Audit and Agreements features help protect against a wide range of model and data attacks, such as frequency, membership, and reconstruction attacks.

Data Analysis

Blind Sample

Blind Sample generates a realistic privacy-preserving sample similar to the real data. Strings are similar lengths, integers are in the same range, and floating point numbers have the same precision. When columns have been unmasked (), a real sample value taken from the Dataset is returned in that column. This value is out-of-context of the row from which it has been sampled.

Outlier Detection

The private analysis returns the index of outlier rows, but never the actual data. Only the Dataset Owner can use these indices to determine the actual outlying data value.

Exploratory Data Analysis (EDA)

The EDA report generated for Datasets and visible in the web interface is created in a completely privacy-preserving manner. These reports do not include statistics unless a sufficient number of elements exist to prevent leaking private information.

Data Operations

Sentiment Analysis

Sentiment Analysis completely protects data privacy. No data is ever reported, only a value representing the degree of positivity or negativity of the sentiment.

Blind Match

Blind Match is TripleBlind’s implementation of a well-known secure multiparty computation known as a 🔗Private Set Intersection. The calculation of a set intersection is completely private, and the only information delivered is information already known to the initiator, since they own one of the Datasets involved in the operation. The only knowledge leak is of membership in the other involved Datasets, and since this operation has to be approved before it can be calculated, the assumption is that the approver is explicitly approving the passing of this knowledge to the counterparty.

Blind Report

Blind Report allows you to position a database-backed query with predefined configurable parameters. Users can configure the query using these predefined options, have it run against your database, and receive a report table. Because the Asset Owner has complete control over how the query can be parameterized, this is a very safe way to make realtime controlled dataset views to be accessed from a database.

Blind Stats

Federated descriptive statistics are a powerful single-point query operation that allows the Asset User to understand populations scattered across remote Datasets. This is usually a very safe operation; so long as the statistics requested are targeted at cohort groupings with more than a few records there is very little risk of de-identification from the output of this operation. To protect against this risk, the Dataset Owner(s) have full visibility into the nature of the statistics requested, as well as the ability to approve or deny the request. Furthermore, TripleBlind automatically rejects Blind Stats operations that request ordered statistics groups that do not meet the k-Grouping record limit of the involved datasets.

The calculation of the counts is completely private. However, the returned counts do reveal some information about membership in the involved Datasets. For example, there is a possibility that a contrived Blind String Search could affirm membership in the source Dataset.

Blind Join

The calculation technique used to generate the intersection (exact or fuzzy) is privacy-preserving. However, the output data fields do include raw data. Fuzzy matching enables obtaining intersections where the matching field (eg. Address) contains formatting or data entry differences, but has potential for misuse if set to allow very inexact matches. Unless an Agreement has been established permitting auto-approval of requests, all Blind Joins require an informed Access Request approval. Blind Join is not permitted to return any columns the Asset Owner has masked, the assumption being that the underlying values in those columns contain PII/PHI or otherwise sensitive information. k-Grouping is also respected in the Blind Join operation as a minimum record threshold on the output; a join that would result in fewer than k records would automatically fail with a warning message.

Blind Query

A Blind Query is a very powerful operation allowing the Dataset Owner to safely allow reports to be generated from protected data. The Dataset Owner is responsible for validating that the report protects privacy appropriately. For example, a SQL clause would need to contain a WHERE clause to ensure a “summary” report does not report a category consisting of a single member of a Dataset, where an average of a single value would effectively report a raw value. Blind Query is not permitted to return any columns the Asset Owner has masked, the assumption being that the underlying values in those columns contain PII/PHI or otherwise sensitive information. k-Grouping is also respected in the Blind Query operation as a minimum record threshold on the output; a query that would result in fewer than k records would automatically fail with a warning message.