Release Notes

This page documents the notable changes from each new version of the TripleBlind Router, Access Point and SDK. The most recent release is always at the top.

Release 2.1.0 – August 14, 2024

Enhanced Administrative and Teams Support

The tb utility now has more management commands for organization and team owners and managers.

Transfer team ownership:
tb admin team set-owner TEAM [email protected]
Remove users from a team (for owners and users with management permissions only)
tb team remove USER [TEAM]
List all users in the organization
tb admin users listNOTE: The old behavior which only listed member of teams the active user was a part of can now be accessed using tb team members

Full documentation of tb commands is available in the User Manual’s Command Line Utility section.

Blind Learning with Natural Language Processing

Blind Learning now supports sequence classification tasks for NLP. The see the reorganized examples/NLP folder in the SDK for details and examples of the growing number of Natural Language Processing tasks.

Other changes and bug fixes

Access Point connection tokens are no longer deleted if the owner generates new tokens. This was a fairly common issue which accidentally disconnected the AP if the owner reset their personal SDK token. If there is a need to intentionally disable an AP, the owner can either shut it down or work with TripleBlind Support to reset their token.
Cleaner handling when assets which are part of a federated group are deleted. The now-incomplete group is removed from the federation and both the asset owner and the federation owner are notified of this resignation.
The tb utility now supports command line TAB completion under Linux/Apple. Run tb install and restart your terminal to enable this.
Resolve several recent CVEs in library dependencies.

Release 2.0.0 – July 2, 2024

Teams Support / New API Version

Back in release 1.55 we introduced the concept of Teams to help organizations better manage departments and projects, along with the assets, agreements, processes and users involved in these collaborative efforts. A user can be in multiple teams, and assets are only owned by a single team (although they can be shared between teams using agreements). Both the web interface and the SDK allow a user to select their active team, defining the ownership for all work done. Learn more about Managing Teams under the Administration Guide and Working wIth Teams under the User Guide.

In support of this transition, we have been supporting a “default team”. However that left some ambiguous cases so this release includes a major API version number update to 2.0. This breaks backwards compatibility with older SDKs and many methods in the SDK have new parameters to make it easier to work with teams. Additionally, organization owners are now able to create and manage their own teams without assistance from TripleBlind.

We’ve also taken this opportunity to clean up some of the early API design decisions made 4 years ago, reorganizing some APIs into better logical structure and unifying terminology in a natural way. Finally, we have transitioned the SDK to Python 3.10 ensuring support for years to come.

This transition should be minimal effort for most, and we appreciate your understanding as we continue to expand our capabilities!

Blind Learning with Natural Language Processing

Extracting insights from the growing body of notes, documents, transcriptions and other forms of text is becoming increasingly important in this business environment. However the state of the art had no consideration for privacy and collaboration required absolute trust amongst data holders and with any researcher attempting to train models with this data. TripleBlind has added this critical piece, bringing peace of mind guarantees of Blind Learning to NLP.

See the examples/Natural_Language_Processing for more details and a live example.

Other changes and bug fixes

The command line utility now has administrative capabilities under the tb admin group, including management of organization Owners and Teams.
Teams can be managed via the new tb team add commands.
The Blind Report pages have polished with several small interface changes
The access point Admin page has refined, breaking the system performance into a dedicated tab

Release 1.58.0 – May 29, 2024

Databricks Support

The new DatabricksDatabase and DatabricksReport objects in the SDK expand the suite of database connections to include Databricks catalog/schema storage. See the examples/Database_Connectors/Databricks.py script in the SDK for more information.

Blind Report enhancements

The Blind Report system has been further refined to make it simpler and easier to use than ever:

Custom codes can be typed in a Code Lookup field.
More validations help prevent errors when building custom reports.
Landlock can now be optionally disabled for preprocess and postprocessing scripts when running on older Linux kernels (e.g. Ubuntu 20.04).

Other changes and bug fixes

Add table of content links to documentation portal. This also makes linking to sections of documentation easier.
PSI VP XGBoost can now be initiated by any party, not just the model owner.
Resolve several recent CVEs in library dependencies.

Release 1.57.0 – April 25, 2024

Federated Blind Reports

Blind Reports have been expanded to support operation with a group of data providers. A single organization can act as the “Aggregator” representing this group, managing the reports and agreements to use these reports for the group. A federated report can generate detailed statistics for the group, e.g. a collection of hospitals jointly creating a cohort of patients with a rare disease. Read more about it in the Blind Reports documentation and peruse example code in the examples/Blind_Reports and demos/Hospital_Data_Federation example in the SDK.

XGBoost across multiple data providers

XGBoost models can now be trained and inferred on using vertically partitioned data. This allows data owned by multiple parties to be virtually joined on a common key, with each party providing a different subset of features for the “virtual record”. See the examples/PSI_VP_XGBoost example in the SDK.

Other changes and bug fixes

Database connections perform exponential backoff retries. This allows “sleeping” databases to wake up to service requests, as well as providing general robustness.
Allow any ReportParameter to be optional using the new “required=False” argument.
Add a warning when “@” is part of the username for MSSQL data sources -- this is not supported.
The tb version command now shows information about the current user, their organization and their active team.

Release 1.56.0 – March 14, 2024

Report Interface Revamp and “Code” Field Type for Reports

The web interface to Reports has been dramatically improved, making reports and report results easier to find and reports easier to run. The new Report Library page is the center of this interaction, putting all of your reports at your fingertips.

Once you find your report, the asset view provides tabs for running a new report or viewing the results and retrieving the output of previous runs. Parameters used in previous reports are quickly accessible, making it easy to find the results you want.

Additionally, the new “Code” field allows easy user exploration and selection of report parameters where the value is a well-known code, such as ICD-9 or ICD-10 diagnostic codes. Users can search for either the code directly, or intelligently search the description field associated with the code, Only valid codes can be selected for injection into the template. See the example/Blind_Report scripts in the SDK for more information.

Split Records Statistics (aka Vertically Partitioned PSI Statistics)

The Statistics operation can now work on data where different pieces of information are housed in different data assets -- even assets spread across multiple owners. Conceptually, a key is used to find matching items in all of the datasets which then gets merged into a single virtual table consisting of columns from all of the distributed data tables. Of course the actual implementation does it in a way which protects privacy for all the participants, not revealing any details to other data owners or any information beyond the statistical summary to the invoker. See the example/Statistics/2_get_split_record_statistics.py script in the SDK for an example of this in action.

Oracle DBMS Support

The new OracleDatabaseDataset and OracleDatabaseReport objects in the SDK expands the suite of database connections to include Oracle. See the examples/Database_Connectors/OracleDB.py script in the SDK for more information.

Team Management

User teams are easier to manage than ever. Anyone with Invite privileges can add a new user to the team, and those with Manage privileges can also pre-assign permissions at the time the user is invited. Users can be invited to multiple teams before they

Other changes and bug fixes

Add whole database positioning and the schema report operation.
Improved the performance and accuracy of Private Record Linkage while also enhancing privacy guarantees.
Improve management of Access Points which are intermittently offline
Support retry for databases utilizing a “sleep” feature, which only wake-up after an initial attempt that fails.
SECURITY: Revamped the Access Point design to run internally as a non-root user, limiting potential escalation attacks.
SECURITY: Updated several library dependencies to address known vulnerabilities (CVEs).

Release 1.55.0 – February 5, 2024

Teams

Teams provide an exciting new option to manage and simplify working within your organization. Organizations can have an unlimited number of teams, and each member of an organization can be part of one or more teams. Agreements and assets are shared at the team level, reducing clutter for other teams. For example, when the “Research” team is training a regression model, the training data and interim models will not be visible to the “Human Resources” team, nor will HR compliance reports be visible to the researchers.

Existing and new organizations will only have a single team by default, so this change won’t impact you until you need it. Learn more about Managing Teams under the Administration Guide and Working wIth Teams under the User Guide.

Secrets Manager

Blind Report assets which connect to databases can now utilize named secrets in the connection strings. This improves security within your organization, only requiring the asset creator to know the name of the secret value, not the actual secret value. Additionally, named values can be easily updated without rebuilding assets, simplifying security practices such as rotating access keys. More information can be found in the Administration Guide under Secrets Management and in the ReportAsset documentation.

Other changes and bug fixes

A new Asset picker makes looking up assets far quicker and more intuitive within the web interface.
Blind Report parameters can now be found by typing substrings, making it easier to work with long lists of report options.
Add Date/Time parameters for Blind Reports. See the new ReportParameter.create_datetime() method.
SQL used to define a Blind Report is now validated before the report is created. This catches and reports basic syntax errors, displaying the error for easy correction.
Retrieved assets now have more meaningful names based on the Asset Name when downloaded from the web interface.
The User Permissions page no longer allows changes to organization owner permissions (the changes were ignored anyway). Additionally, users are now displayed in alphabetical order, not order of creation.

Security Updates

SECURITY: Reduce the web interface session timeout to 4 hours.
SECURITY: Signed the Windows installation script, certifying it has been unaltered.
SECURITY: Close a potential security hole if a malicious actor attempted to replay a web request enabling multi-factor authentication (MFA/2FA).
SECURITY: Tighten Content-Security-Policy settings on some Router responses.

Release 1.54.0 – December 18, 2023

Private Record Linkage

The existing Private Set Intersection (PSI) provides an excellent method for working privately with other parties when there is a well-defined identifier that is common to all involved datasets, such as a social security number. However there are many cases where no such identifier is available and other types of fields need to be examined to determine if records belong to the same individual. Complicating this more, information such as names and addresses are often messy -- such as a middle name recorded as “Steve” or “Steven” or “Stephen” or a simple “S”.

Private Record Linkage uses advanced techniques to compare multiple fields to establish a confidence level that a link exists between records. Cryptographic techniques ensure that no information is leaked to the other parties about any individual they don’t already know from their own dataset. All is accessible using simple code, such as this:

table = tb.TableAsset.find_linked_records(
   datasets=[my_data, their_data],
   match_columns=[
       ("LAST", "LAST_NAME"),  # different field names between datasets
       ("FIRST", "FIRST_NAME"),
       ("MIDDLE", "MIDL_NAME"),
       "HOUSE_NUM",
       "STREET_NAME",
       "STREET_TYPE_CD",
       ("ZIP", "ZIP_CODE"),
       ("PHONE", "PHONE_NUM"),
       ("GENDER", "SEX_CODE"),
       "AGE",
   ],
   # The match_threshold is optional, the default of 0.88 works well typically
   match_threshold=0.88,
)

See the example/Private_Record_Linkage script in the SDK for more information.

Security Enhancements

Various aspects of connection security have been improved, thwarting different attack vectors which might have been attempted in the future. No exploitation has occurred, these are preventative measures.

End-to-end encryption of traffic between Access Points.
Although the connection itself was encrypted, the content inside the connection was not and could be visible in certain situations, such as a man-in-the-middle attack or a dishonest participant who abuses systems such as load balancers to peek at the traffic before it reaches the access point. While no raw private information is passed between participants, this still completely protects all operational details between APs.
Switching to encrypted JSON Web Tokens (aka JWTs) and tightening up usage and lifetime of tokens.
This hides the operational details which can be encoded within a JWT and prevents things such as replay attacks which attempt to re-use a valid token.

Unfortunately these changes will prevent connections with APs or SDK users running previous versions of the software. Additionally, APs must look at the same chain of trust to be able to validate certificates for connections with any partner APs. Contact TripleBlind support if you have issues interoperating with a partner organization.

Other changes and bug fixes

Add more error checking to SDK installers.
Manage Details permissions are now required for all modifications to asset information. Previously an organization member without the permission could still edit mock data.
Mock data is no longer shown for “output assets”, such as a run of a Blind Report or Statistics results. The mock values shown in the web interface were confusing in these cases.
BUGFIX: Creating a database query asset with an invalid query left files locked under Windows. This was particularly noticeable when using Jupyter notebooks, as the file could not be re-opened without restarting the notebook, making it awkward to correct the query.
BUGFIX: Portal videos were being blocked by most browsers.
BUGFIX: Vertical Regression could report an incorrect mean square error (MSE) in certain situations.

Release 1.53.0 – October 26, 2023

Usability changes and Bug fixes

Swagger documentation is no longer presented in the portal to avoid confusion.
The non-PSI versions of several protocols have been removed since the PSI version has all the same functionality. These include: KMeans Clustering, Vertical Decision Tree training and inference, Vertical Regression training and inference.
Column type detection better handles missing/NaN values in numeric columns.
Improve the tb requests interaction
Regression inference now outputs probabilities rather than a binary 1/0 result. This matches the existing behavior of the Vertical Regression protocol.

Bug fixes

BUGFIX: The Asset.archive() API didn’t return a boolean result in certain situations.
BUGFIX: Access Point setup instructions had a typo (extra space) in a command for copy/pasting.
BUGFIX: XGBoost library version has changed from 1.7.6 to ensure compatibility on all platforms. Also adds special instructions for requirements installation on Apple M1 hardware in some examples.
BUGFIX: Mock data was not generated appropriately for binary 1/0 columns.
BUGFIX: Adding multiple spaces after a search term in the user interface would return no matches.

Security Updates

SECURITY: Address vulnerabilities from malformed email addresses provided for invitations.
SECURITY: User names can no longer contain these special characters: <>=!@#$%^&*/”?~
SECURITY: Updated several library dependencies to address known vulnerabilities (CVEs).

Release 1.52.0 – September 20, 2023

Web-based Blind Stats

The web user interface now allows you to run a Blind Statistics analysis without any Python code with all the same privacy guarantees! The 🔗Create New Process page allows you to launch a Blind Stat. The Blind Stats wizard walks you through picking the dataset(s) you want to work with, then select the statistical operations you want to perform.

The operation still requires permission to access assets, either through Agreements or manual permission grants. Once the process completes, the results can be downloaded directly from your Access Point via a link in the Router interface, again retaining the same level of privacy guarantees offered by Python scripts.

Web Interface performance refinements

The functionality of Audit pages has been honed and refined, improving responsiveness.
The Q&A pages have been refined, improving responsiveness.

Other changes and bug fixes

Optimized the Swagger documentation generator.
The Sentiment Analysis, Redact and Knowledge Graph examples and protocols have been retired.
The SDK train() and predict() methods now raise exceptions when the operation fails instead of returning None.
BUGFIX: Filtering by owner in the web interface would ignore keyword search terms, instead displaying all owned assets.
BUGFIX: Parsing of the tb asset retrieve command would ignore any specified “Save As” name.
BUGFIX: Using the Stop functionality with a Jupyter Notebook cell while it is waiting on a remote action to complete would generate infinite error messages and eventually crash the browser.
SECURITY: Switched to Safetensor instead of Pickle in the image preprocessor serializer.
SECURITY: Resolved several potential security issues.

Release 1.51.0 – August 21, 2023

Python Preprocessor

Tabular data preprocessing can now be performed using Python code, in addition to the existing SQL technique. A script using the tripleblind library can call the new TabularPreprocessorBuilder.python_transform() method, pointing to the transform function to apply to a Pandas dataframe. Any Pandas or NumPy method can be used, but all other libraries are blocked. As with SQL transforms, these preprocessing steps are visible to the data owner for inspection before approving usage.

The Python code can be provided as either a string or, more conveniently, a filename containing the transform code. For example:

preprocess_b = (
    tb.TabularPreprocessor.builder()
    .all_columns(True)
    .dtype("float32")
    .python_transform("python_transform_script.py")
)

The python_transform_script.py must contain a transform(df: pd.DataFrame) function and return a Pandas DataFrame which is used in the given operation. See the example/Data_Munging/2b_model_train.py script for further reference.

To support developing these scripts, the two new tb preproc create [asset] and tb preproc test [script.py] [data.csv] utility commands are available.

Web Interface refinements

Assets can now be deleted from the web interface.
The “FAQ” functionality is now known as “Q&A”.
Many small visual tweaks for consistency and usability.

Other

All tabular results now include a column header line.
Record identifiers can now be requested with inference output. This can be particularly helpful in validation of new models against ground truth.
Performance has been improved during some protocol evaluations.
Improvements to the Windows SDK installer to better account for the potential variety of host machines.
BUGFIX: Preprocessor command preview was only displayed to the first data owner when a single preprocessor was shared by all assets.
SECURITY: Alpha-numeric (plus underscore) tabular column names are now fully enforced.
SECURITY: Addressed various CVEs by updating dependencies.

Release 1.50.0 – July 6, 2023

Inference Identifier Columns

When running an inference the default behavior is to just deliver the result, ensuring identifying information isn’t leaked. However when testing inference correctness during model development, it is often necessary to tie the output back to the data via an identifier. Now the inference operation allows the invoker to request “identifier_columns” as part of the result output when working with tabular data. This allows the data owner to correlate the output back to the data they own.

NOTE: The inference result asset now includes column headers. If you open this as a dataframe, the Pandas header autodetection often fails and reads the first line as data instead of a header row. You can correct this by either using pandas.read_csv(FILENAME, header=0) or calling result.load(header=True) against the TripleBlind result asset.

Consistent SDK tb.Operation usage

An early design decision resulted in the same operation using independent identifiers when creating an agreement and when used to start a process. In most cases these were at least similar, although a few differences could be confusing. Now all SDK code uses tb.Operation.NAME exclusively. This will require a change to some existing scripts that directly call create_job() such as:

job = tb.create_job(operation=tb.RECOMMENDER_TRAIN, params={...})

is now:

job = tb.create_job(operation=tb.Operation.RECOMMENDER_TRAIN, params={...})

For most scripts, simply adding the “.Operation” is all that is needed. However there are a few exceptions:

tb.REVEAL / tb.Operation.PRIVATE_QUERY are now tb.Operation.BLIND_QUERY
tb.ROI_DETECTOR_TRAIN is now tb.Operation.REGION_OF_INTEREST_TRAIN
tb.PSI_JOIN is now tb.Operation.BLIND_JOIN
tb.HEAD is now tb.Operation.BLIND_SAMPLE

SDK

Added “mask” and “unmask” commands to the command line utility, for example: tb unmask "my dataset" "field1" "field2"
Improved responsiveness of tb request access request management.
The tb.ModelAsset.find() method now returns a ModelAsset object, not a generic Asset.
Improved efficiency of progress messages shown during the execution of an operation. It had been needlessly querying the Router multiple times a second.

Other

Various small tweaks to make the interface more intuitive and consistent.
BUGFIX: Statistical operations now auto-coerce data to float for all operations except “count”. Previously, a value such as “NaN” in a dataset could make the column be categorized as a string, preventing calculation of things like “mean” on the whole column.
SECURITY: Adjusted timing of response to avoid phishing attacks looking for emails with user accounts.
SECURITY: Addressed various CVEs by updating dependencies.

Release 1.49.0 – June 1, 2023

Enhanced CSV Import

The “comma separated value” (aka CSV) file is a workhorse in data science. Spreadsheets can be saved to this format, most tools can import and export them, and they are easy to examine manually. However their flexibility can be problematic with missing header lines, duplicate field names, field names that can’t be used by some tools, or corrupt data due to blank rows. The web interface and the SDK now provide more tools to make it easy to identify and fix problems like this.

The web UI now provides a preview of the import, allows custom separators, and catches invalid files with varying record sizes. The CSVDataset.position() method in the SDK also performs validations and provides tools to automatically rename invalid or duplicate field names.

Streaming Inference Enhancements

The ability to produce a continuous stream of inferences at regular intervals was introduced in release 1.47. This has been enhanced in several ways:

Both parties involved in the operation (data and model provider) can now view the inference stream. See the SDK demo/Continuous_Inference/4c_stream_updates.py example.
The inference interval adjusts to account for any delay when acquiring data and for the inference calculation itself.

Model Factory

The SDK now has a ModelFactory for creating custom neural networks using standard architectures. While they can also be defined manually, the definition can be verbose with repeating layers, also making it easy to create subtle bugs. The ModelFactory.vgg() method can create a Visual Geometry Group style network with a single line of code. See the Cifar example’s 2_train.py step for details.

Web Interface

Granular email notification settings. See the “Notifications” tab under My Settings, found in the upper right user menu.
Performance optimizations, part of an ongoing effort to improve the user experience.
Blind Report parameters now appear in the order used at creation time rather than alphabetical order by parameter name.
BUGFIX: Audit reports could not be downloaded in some situations.
BUGFIX: Declined access requests could get stuck as “In Progress”.

SDK

Error messages are more verbose, part of an ongoing effort to improve the user experience.
Update some tutorial notebooks to use current best practices.
BUGFIX: Allow killing of “zombie” processes. These occur when an Access Point restarts in the middle of a running process and similar situations.

Release 1.48.0 – April 19, 2023

Preprocessor Data Cleaning and Normalization tools

Working with data which you cannot see can be tricky. The preprocessor now provides several ways to cleanse data using the TabularPreprocessor.handle_nan() method. With it, you can set missing or Not a Number numeric values to a fixed value; to a statistical value such as the max, min, mean, or median; or you can simply “drop” the row containing the missing value.

The TabularPreprocessor.replace(before, after) method allows replacing any value in the dataset before it gets used in an operation. This can be combined with the handle_nan() method to replace a value with a statistical property or drop it altogether.

# Drop any row containing missing values in any column
pre = tb.TabularPreprocessor.builder().all_columns(True).handle_nan("drop")

# Fill any missing value in column "A" with the mean of the values found in
# that column, and fill missing values in column "B" with the number 42.
pre = (
   tb.TabularPreprocessor.builder()
   .all_columns(True)
   .handle_nan({"A": "mean", "B": 42})
)

# Replace the value -1 in column "A" with NaN.  The handle_nan() method could
# be used to convert that into a replacement like the mean.
# Also change the value "M" to "male" in column "B".
pre = (
   tb.TabularPreprocessor.builder()
   .all_columns(True)
   .replace({"A": (-1, np.nan), "B": ("M", "male")})
)

Process Management

Users now have much better control over processes they have created. Administrators can also be granted the ability to control others processes with the new Process > Manage permission.

Within the web interface Active Process list, you can now Cancel any running or queued operation. This is also possible from the command-line interface using the tb process cancel command, or optionally by using Ctrl+C in an SDK operation which is waiting on the process to complete.

Additionally, the new tb process connect command allows you to reconnect and view any running process. This can be useful with long running operations such as model training. The SDK provides access to the process output stream via the StatusOutputStream object obtained from Job.get_status_stream() or returned by invoking Model.infer(stream_output=True).

Learning Rate Scheduler

Blind Learning and Federated Learning protocols can now define a learning rate scheduler to vary how quickly the model learns as the training progresses. You can specify either the CyclicLR or Cyclic Cosine Decay scheduler. See example/Cifar in the SDK for an example of how this works. If not defined, learning occurs at a constant rate.

Data positioning wizards for Azure Data Lake / Blob Storage, Amazon S3, and Amazon Redshift

The web user interface can now be used to create assets that reference data stored in several popular cloud databases, as well as simple CSV datasets. These are accessed via the New Asset button in the upper right when browsing assets.

Settings and Credential Management

User settings have been reorganized under the My Settings dialog found by clicking your name/avatar in the upper right corner. In addition to the general reorganization, you can now retrieve a pre-populated tripleblind.yaml when setting up your development environment or resetting your access tokens by visiting the Authentication tab and clicking Download tripleblind.yaml.

Web Interface

Revamp of the Agreements page to clearly show incoming and outgoing agreements for access to assets.
Support scientific notation for Blind Report parameters.
SECURITY: Email login is now case-insensitive.
BUGFIX: Blind Report would sometimes show links to operations which were not Blind Report runs.
BUGFIX: The CSV export of an Audit was not always working.

Release 1.47.0 – March 15, 2023

K-Means Clustering

K-Means Clustering is an algorithm to partition data into k non-overlapping subgroups (aka clusters) where each data point fits into only one of these subgroups. The algorithm attempts to partition so that the data in any single cluster is as similar as possible. During the process, the actual content of the data is not exposed to the other party. In the end only the cluster centers (means) are revealed along with an inertia value which indicates how well the clusters fit the raw data, plus information allowing the trainer to match their training input with the appropriate cluster classification. This can be performed on distributed datasets owned by two separate parties, with an optional private set intersection used to find matching data. Finally, inference can be performed with the trained model for predicting the cluster new data would fit into.

Real-time Streaming Inference

The federated inference protocol now supports streaming inferences on a regular interval. This is a powerful combination when paired with a live database asset, enabling real-time predictions based on live data.

Blind Report in web interface

Blind Reports can now be run directly from the web interface. When a Report asset is visited, users can set the desired parameters and launch a report. The “Your Reports” section shows results of previous runs.

Reports are still defined using the SDK, and now more error checking is performed to prevent creating invalid reports.

Web Interface

BUGFIX: Mock data search bar and column rename wasn’t functional
Administrators can now initiate a password reset for other users
Add link to the Learning System from Help
Pages have better titles in browser history
Notifications are now turned off for new users by default
Tooltip explanations for permissions

SDK

BUGFIX: Blind Joins and Blind Reports can successfully return an empty result if no overlap or output is generated. Previously it ended with an error code.
A new set of Regression Asset classes simplify training and inferring against a variety of regression model types.

Release 1.46.0 – February 1, 2023

Parameterized Blind Report

Predefined database-backed reports can now be defined, allowing variable but secure queries to be run safely on even the most sensitive data. For example, a report could be defined that allows a user to select a year and month along with a company division for generating salary statistics by ethnicity or gender for usage in compliance reporting. Any number of variables and any complexity of queries are supported. See the examples/Blind_Report for documentation and more information.

PSI + Vertical Regression

Regression training and inference can now be performed on vertically partitioned, distributed datasets. Additionally, PSI can be applied to allow operation only against common records, without revealing that overlap in the process. See the examples/Vertical_Regression and examples/PSI_Vertical_Regression for more information.

PSI + Vertical Decision Trees

Decision trees can now be trained on vertically partitioned, distributed datasets. This allows training models between organizations with unknown numbers of overlapping customers or records, without revealing the overlap in the process. See the examples/PSI_Vertical_Decision_Tree for documentation and more information.

Other

Changed login and password reset to operate using the user’s email address instead of a username.
Add tb ps cancel command for canceling queued processes from the command line.
The new optional unmask_columns= parameter can be used to unmask while positioning tabular datasets.
Add warehouse= parameter for Snowflake assets to select a Snowflake Compute Warehouse.
Various web interface refinements to the navigation menu and asset details pages.

Fixes

Security: Prevent an attacker from re-queuing a previously approved operation in certain situations.
Security: Prevent a potential password reset replay attack.
Security: Prevent certain combinations of preprocessor operations in Blind Join which could be potentially exploited and have no practical value.
Security: Highlight SQL_Transform preprocessor steps during Access Request approvals for additional scrutiny.

Release 1.45.0 – December 20, 2022

Decision Trees for Vertically Partitioned Data

Vertically partitioned data is when multiple organizations have different pieces of information on the same individual. For example an individual with general demographic information in one dataset, medical history in another dataset, and pharmacy records in a third dataset. With a common identifier, such as a social security number, datasets owned by different organizations can remain in place yet still be used as training data for regression or classification decision trees. See the examples/Vertical_Decision_Tree in the SDK.

Blind Join Refinement

The Blind Join protocol has been enhanced to better match expectations for those familiar with various types of SQL join operations. Now a user can choose from Left or Inner (traditional or Partitioned) join operations. See examples/Blind_Join and the blind_join documentation for more information.

Data Connectors

New Azure Blob Storage data connector.
Add chunking support for MongoDB to support larger-than-memory datasets.
Rename the examples/Database_Connectors to the more appropriate Data_Connectors.

SDK Ease of Use

In order to simplify data management and keep examples focused, several changes are being made:

Input assets have been renamed (typically beginning with “EXAMPLE - “) and the use of run_id.txt discontinued.
The new data_requirements.yaml file points to the source of data used for the example, typically under the examples/Assets directory.
Most asset-related error checking is eliminated when tb.initialize(example=True) is used.

This revamp has been done for around half of the examples and will be completed in the next release.

As part of this ease-of-use effort, parameters to protocols are now validated before usage. This allows for immediate and clear error messaging when a parameter is incorrect (such as an unknown or misspelled loss function during training) or a required parameter is missing.

Web Interface

Organization owners can now require or disable the usage of 2FA for their users.
Revamped the presentation and owner management of data assets.

Other

Expand ONNX support with 20 new operators.
Support custom SSL trust chains, defined in system settings.
Split Enterprise Mode endpoints on the Access Point to a unique port, allowing for distinct security settings for internal users.

Fixes

Performing accept/deny of an access request via the tb requests command line utility worked correctly, but gave incorrect feedback.
Fixed the inference script in the LSTM example.
Adjusted the timeout on the Redshift connector to make it more reliable.
Blind Statistics could fail on data containing NaN (not a number) values.

Release 1.44.0 – December 5, 2022

Bookmarks

Users can now create and curate their own set of bookmarked Assets. These can be kept in a general set of bookmarks, or under a named collection. Clicking the icon in the upper-right of an asset view brings up a dialog to add or manage the bookmark. The new My Bookmarks page shows all you have marked.

Permission Assignments at Invitation Time

Administrators can now assign permissions while inviting the user. Previously the user had to accept the invitation first, requiring the administrator to grant permissions after this acceptance before the new user could do anything.

Asset Audit and Browsing Performance Improvements

Several optimizations were added to make response times for both the web interface and SDK operations more snappy. Additionally, the default date range for audits now covers the last seven days.

Fixes

Portal searches to “preprocessor” and “tripleblind” module reference documentation was broken
Missing asset message improved
Improved messaging when information is requested from an Access Point that is offline
Email notifications now point to the support website instead of an email address
Added “last seen” to user details in administrative view
Exact searches from the tb utility could potentially return non-matching assets.
Asset details no longer available to users without visibility, even with the correct URL link
Updated several dependencies to latest know-secure versions.

Release 1.43.0 – November 2, 2022

Audit Asset overhaul

The 🔗Audit Asset Usage page has been significantly expanded. Three types of reports are now easily accessible on the tabs:

Activity Report - detailed list of all individual operations involving assets owned by you
Weekly Report - a summary of asset usage, showing daily and weekly total access counts
Usage by User - summary of users who have utilized your assets during a given time range

Data connectors

New data connectors make it easy to connect your data for usage within TripleBlind. Now you can connect an asset to:

Amazon Redshift
Amazon S3 Bucket
Azure Data Lake Storage

Additionally, examples of all connectors have been gathered into one location in the SDK. See the examples/Data_Connector subdirectory.

NOTE: The Snowflake example has been folded into this location.

Modeling Capabilities

Predictive Model Markup Language (PMML) support has expanded to include randomForest tree definitions.
Add SMPC GlobalAveragePool ONNX operator.

Asset Explorer facelift

The 🔗Explore Assets browser is easier to use than ever. A new search and filter bar docks at the top of the window as you scroll through assets, making it easy to refine. Additionally, the asset cards have been enhanced with clearer styling and information about the user who positioned it for owned assets.

SDK

A variety of changes have been made to make the SDK easier to use.

New method TableAsset.get_column_names()
Parameter validation
Better error messages.
Chunked uploads when positioning data

Fixes

Admin panel user search is now case insensitive
Fix incorrect conv_layer_A and conv_layer_B outputs when strides != [1, 1].

Release 1.42.0 – October 5, 2022

Two-Factor Authentication

Two factor authentication (2FA) is now available under each user's 🔗My Account page in the Security Settings section. Once enabled, the user will be walked through the process to set up 2FA.

Web User interface

A handful of small but impactful changes have been made to the web interface:

The documentation Portal is now available to all! Links to documentation can be freely shared with colleagues.
The preprocessor steps are now reflected in the Asset Request for operations which utilize them.
Mock Data now supports generating mock Age, Sex, Date, and Time values.
Process creation now allows going backwards in the setup steps.
Links and emails now point to the new TripeBlind support site.

Installers

Mac/Linux: Previous installs of Anaconda would break the TripleBlind installer due to conflicts between Anaconda and the Mamba solver. Now a warning is shown and the slower Conda solver is used if Anaconda is detected.
Mac (M1): The installer was giving a false failure message.

SDK

The blind_join() method now accepts a custom preprocessor, which will override the default.
When the `allow_overwrite` param is set during asset positioning, an existing private asset of the same name will now be archived.

Security

Tighten security settings for session id and CSRF token cookie.
Web logins now expire after 48 hours. Users will be prompted to login again after expiration.
Closed a potential XSS vulnerability which could allow Javascript code to be executed if their JSON packet was viewed in a browser (unusual, but possible).

Release 1.41.0 – August 31, 2022

ONNX Support

The Open Neural Network Exchange format (.onnx) is now supported for positioning neural networks. This format has the most complete set of features and is now the preferred format for working with TripleBlind, although basic PyTorch (.pth) and Keras (.h5) model support will remain. See the Pretrained_NN_Inference example (replacing the older Network_Provisioning example) to see how to work with all three model types.

Additionally, a new network compatibility tool has been added to the tb utility. It can be invoked using tb validate [DIRECTORY | FILENAME] and will flag networks which use unsupported layers or features.

Access Requests

The Access Request page now allows viewing of full job detail, including parameters included on requests. This gives data owners the information needed to fully understand what is being proposed as a process with their dataset.

Additionally, Audit logs now record denied requests.

Knowledge Graph Random Forest

A new and powerful network training technique has been added to the toolkit. Knowledge Graphs can represent complex interactions between disparate data nodes. An example of this technique in action to train and use a network to detect money laundering can be seen in the Knowledge_Graphs example.

SDK

The new tb.ModelAsset class makes using a trained neural network or statistical model as easy as a single line. Example usage:

import tripleblind as tb

# Look for a model Asset ID from a previous training
trained_network_id = tb.util.load_from("model_asset_id.out")
trained_network = tb.ModelAsset(trained_network_id)

result = trained_network.infer(
   "test.csv",
   preprocessor=tb.DocumentPreprocessor.builder().document_column("TEXT"),
)
print("Inference Results:")
print(result.table.dataframe)

Security

Several new security features have been implemented this release:

Password strength meter and complexity requirements
Resolved CVEs in several common libraries

Release 1.40.0 – August 3, 2022

Python 3.9 Update

The default SDK environment has now been updated to Python 3.9.13, along with appropriate updates to other support libraries. In addition to the new syntax and base library features for the Pythonistas out there, this also improves the security posture as Python 3.7 approaches its end of life.

Mock Data Editor

Mock Data is now customizable by data owners directly from the asset profile screen in a WYSIWYG (what you see is what you get) way rather than the old method under Asset > Manage > Mock Data.

PSI extensions

Several improvements have been made to the PSI functionality:

Independent preprocessors can be specified for each input dataset in a PSI Join.
Multiple match columns can be specified for PSI Vertical operations.
Improve PSI memory usage efficiency when working with massive datasets.

Security Updates

HTTP Strict Transport Security (HSTS) is now enforced in response headers
Use HMAC instead of Hash for access point challenge response
Employ Server-Side Request Forgery (SSRF) protection for access point ping from the Router

Other updates

Add the ‘sha256()’ method for use in sql_transform() operations.
Masking settings are now enforced in Blind Query operations, preventing preprocessor and other potentially misleading requests for data use from being submitted.
Blind Query now honors the k-Grouping setting, rejecting output that doesn’t cross the threshold.
BUGFIX: `tb --version` would crash
BUGFIX: Agreement search in the GUI gave inconsistent results
BUGFIX: Asset Explorer search in the GUI gave inconsistent results
BUGFIX: Correct grammar in registration/password reset emails.

Release 1.39.0 – June 29, 2022

Distributed Regression

Training a regression model on distributed datasets is now supported via the tb.TableAsset.fit_regression() method. Multiple datasets can be assembled into a set of single virtual records via a private set intersection (PSI) on a unique key value found in all of the datasets. The resultant fitted model can also be used to make predictions against distributed datasets. See the PSI_Regression example.

Dataset k-Grouping Setting

All data assets now have an associated “k-Grouping” setting. Based on the concepts of k-anonymity, this value is used in aggregation methods to make sure no group smaller than the setting is reported in aggregations. This helps reduce probing for information about individuals based on outside knowledge of some characteristic of that individual, commonly referred to as a linkage attack. Initially this applies to the statistics methods in Blind Stats.

Access Point Enterprise Mode

The TripleBlind network architecture is very flexible, but enterprise security architecture sometimes calls for less flexibility. An Access Point can now be configured for enterprise mode which results in two things:

API calls made by SDK scripts will route through the Access Point rather than calling the Router’s API endpoints directly. This reduces the necessary exception in external firewalls to the single port 443 route to/from the Access Point.
Users must be provisioned at the Access Point, providing a tight “multiple-signature” access control that is entirely in the hands of the enterprise IT department which hosts the Access Point.

Learn more about this in the Enterprise Access Point Setup and in the Access Point Administration guides.

Security Enhancements

Authentication tokens expire after 6 months. All existing auth tokens will need to be regenerated by visiting the 🔗My Account page.
Authentication tokens can no longer be shown, only refreshed.

SDK enhancements

New tb.TableAsset.mask_columns() and tb.TableAsset.unmask_columns() methods.
New tb.Requests.accept_all() and tb.Requests.deny_all() methods.
SDK network calls are more resilient to slow and unreliable network connections.
Improved SDK support for Windows and M1 based Macs.

Other updates

The technique used to securely average models during blind learning has been significantly improved, decreasing time for each average by 15x.
Output from the Blind Sample is no longer “re-masked” when viewed in the Router interface.
The dedicated Median operation has been deprecated, and the Median example in the SDK has been removed. This functionality is now available as part of Blind Stats, along with various other ordered and descriptive statistics (see the Statistics example in the SDK).

Release 1.38.0 – May 26, 2022

Statistics Support

The term “order statistics” refers to the class of basic statistical operations which require ordering the data. This includes many common operations, such as min, median and max. This is now possible with distributed datasets without sharing any of the data amongst the participants, retaining perfect privacy. The full list of statistics supported are:

Min
Max
Median
Quartiles
Mean
Variance
Standard deviation
Skew
Kurtosis
Count
Confidence interval
Standard error

Additionally, stratification of samples is supported through a grouping function. A type of k-anonymity guarantee is included, which blocks calculations that involve fewer records than the k value (currently hard-coded to 5).

Mock Data Editor

The ability to control how Mock Data appears has been greatly enhanced with the new Mock Data Editor. A sensitive field can now be marked as masked (hiding the content of the raw data behind a “mask” whenever it is made visible), and those masks can be customized. For example a field could be assigned a Full Address mask so a very realistic looking address such as “250 Brooks Radial Suite 868 Meyerview, NE 95054” would be displayed. Several dozen of these masks exist, allowing for very realistic example and sample data. You can see this in the Blind_Sample SDK example and the Create New Process > Blind Sample.

Preprocessors

Several new preprocessor mechanisms are being developed and are available in this release as Early Access. Currently, these wrap Scikit-Learn transformers and address typical preprocessing needs: One-Hot Encoding, Ordinal Encoding, and Binning. Each is wrapped using OneHotEncoder, OrdinalEncoder, and KBinsDiscretizer respectively. See the Random_Forest example which has been modified to use both the OneHotEncoder and the KBinsDiscretizer preprocessing data transformers.

Security Enhancements

A handful of visible and under-the-hood changes are being made to further limit the potential and potential damage of security breaches:

Accounts lock after 6 consecutive failed password attempts (prevents brute-force attacks)
Administrators are emailed whenever a user account is created
A set of Permissions are now associated with users to restrict the ability to retrieve organization assets. The four permissions are:

1. Algorithm Assets > Retrieve Asset
2. Algorithm Assets > Retrieve Result
3. Dataset Assets > Retrieve Asset
4. Dataset Assets > Retrieve Result

The "Retrieve Asset" applies to any asset which was Positioned(). The "Retrieve Result" applies to assets which are output from an operation.

Other updates

Add ‘--details’ flag support for the tb ps list command to show the full date/time
Masking rules now apply to output fields for SDK runs of Blind_Join to match the web interface

Release 1.37.0 – May 5, 2022

Web Interface

The web interface has changed significantly. This includes organization improvements and new functionality. Notable changes include:

Add an Agreements Overview page
Access Point health indicator
Enhanced data positioning
Bulk Masking

Documentation Overhaul

The documentation portal has been revamped with both a new look and new content. It should now be quicker to navigate and a more comfortable read.

Enhanced CNN Inference support

Extended supported CNN layers to include: conv3d, max_pool1d, adaptive_avg_pool2d, adaptive_max_pool2d, leaky ReLU. Also adjusted defaults for all existing layers in the Network_Builder where the defaults varied from PyTorch.

SDK management of Requests and Processes

The SDK and tb command-line utility have been expanded to include management of Requests and Processes. Query and manage outstanding Access Requests, accepting or rejecting via tb requests and the new tb.Request object. Query process status via tb process and the new tb.Job.find() method.

Other updates

Security fix: The registration and password reset processes have been revamped to address weaknesses identified during penetration testing.
CSV handling has been hardened to support exceptional cases (infinite values, missing values, etc).
SDK support for Apple M1
The SDK no longer holds default tokens. Contact your customer success manager if you need assistance obtaining tokens to run tutorials.
Enhanced logging and log viewing.

Release 1.36.0 – February 23, 2022

Setup Guide

The process of setting up a new Access Point has been greatly simplified.

When logging in to the organization owner account for the first time, if no Access Point is set up you will be presented with a guide for Access Point installation.
The “Setup a Server” instructions provide server requirements and pointers to Azure, AWS, and GCP documentation, much like in previous releases.
The “Setup your Access Point” instructions guide you through a greatly simplified process leveraging a new tbadmin Command Line Interface (CLI) and Validation Server. The process can now be completed with just two commands taking just minutes start to finish.

NOTE: The TripleBlind Private Data Sharing Solution is now available in the Azure Marketplace! Through this subscription you can set up an appropriately sized VM to host your Access Point.

SMPC Inference Improvements

During the course of a pilot validation with a pretrained neural network, an unexpected variance in accuracy between “native” and SMPC inference was noted with the new model. This led to a deep review of the SMPC inference.

ENHANCEMENT: Encoding scheme used for weights is much more flexible. Previous models encountered all had internal weight values that were fractional values (e.g. 0.1 or 0.002), but this new model included some values that were nearly 50,000. A new scheme was implemented to better manage this large variance in magnitude.
BUGFIX: Even kernel padding was being handled slightly differently in conv2d layers than in PyTorch.
Added parameters data_scale_factor, weight_scale_factor, minimum_data_value and minimum_weight_value for manual encoding overrides, although this should rarely, if ever, be needed.
Documented behavior of the “model_output” and “final_layer_softmax” parameters in the Network_Provisioning/2_remote_inference.py example.

As a result of these changes, the SMPC inference result variance versus raw inference was brought down from an average variance of 5% to less than 0.0012% in the new model. Additionally the accuracy of existing test models improved from 0.00018 to 0.000154 by default, with custom parameters allowing up to two further orders of magnitude of accuracy at 0.000004.

Router User Interface

Ongoing refinement of the user interface has resulted in polish throughout, plus these notable changes:

The Blind Join operation now only allows “Unmasked” fields to be returned as output. Masked fields can still be used for the join, but not for final reporting.
The user’s Authentication Token is now found under My Account (formerly Profile) when clicking on the user icon in the upper-right. Previously it was found under My Organization (formerly Settings).
BUGFIX: Changing only an asset name (and nothing else) would not stick.
BUGFIX: The New Token button wouldn’t work in certain situations.

SDK updates

Two new demos illustrate using a fuzzy search and a Blind Join in healthcare-focused situations. The techniques are virtually identical to the examples, the only difference being the fictional datasets are designed to look like simple patient databases.
Referenced example support files (e.g. comma separated value files with fake data) now are hosted on tripleblind.app instead of shared Google Drive links. This should be easier for organizations with strict firewall limits to outside resources.
Add tb --version command to easily view the version of the local SDK, your organization’s Access Point, and any available updates.
Switched from “conda” to “mamba” for environment management. This should be transparent, but makes the installation of the SDK significantly faster.
Transitioned to “FED” instead of “AES” terminology. The "AES" description for the alternative to SMPC operations wasn't clear to many people. The term "FED" (short for Federated Operations) is familiar to most from the Federated Learning technique, which involves a decentralized group working together to achieve an end. Now the parameter is security="fed" for operations where the algorithm is not hidden, but the data is still fully protected. As before, all transmission of information is still fully protected by SSL/TLS with AES-256 encryption used to protect the content, only the value name has changed. “AES” is still supported as a synonym, so existing scripts will continue to work.
Corrected quirky behavior in the tutorial notebooks. Previously portions wouldn’t run correctly, depending on the operating environment.
Revamp of search methods. The Asset.find_dataset() and Asset.find_algorithm() methods have been deprecated, although still available as Asset._find_dataset() and Asset._find_algorithm(). In their place are the cleaner Asset.find() and Asset.find_all() methods which support the optional parameters dataset=True and algorithm=True. Finally, a bug was fixed which prevented exact regex matches using a trailing “$”.

Usage now looks like this:

# Search for a singular dataset, multiple matches will throw an exception
asset = tb.Asset.find("/^rege.$/")  # match 'regex' or 'reget', but not 'regert'
data = tb.Asset.find("mydata", dataset=True)  # match datasets only
alg = tb.Asset.find("myalg", algorithm=True)  # match algs only
# Search for pattern with multiple matches
for a in tb.Asset.find_all("somepattern", dataset=True):
  if a.name.endswith(datetime.date.today()):  # match names from today
      # do something with these new datasets

Other updates

Blind Samples now include column names as the first row.
Eliminated several potential CVE threats

Release 1.35.0 – January 12, 2022

Access Point Capability Flags

Recognizing the varying needs of different organizations, we have added the ability to set limits on accessible capabilities when setting up an Access Point. The new TB_CAPABILITIES_FLAG environment variable currently controls access to the Blind Join, Private Query, Redaction and data masking functionality. We have set the default to enter the most conservative mode and disable those features unless explicitly allowed. This ensures these features are used and managed in a way that ensures sensitive data is not inadvertently shared.

For existing Access Points, you can access these capabilities by:

Logging in to the Access Point host machine (administrator)
Edit the tripleblind.env Docker configuration file
Add: TB_CAPABILITY_FLAGS=1
Restart the Access Point

Blind Sample

The new Blind Sample functionality allows a data consumer to obtain synthetic data in order to get a feeling for the shape of the private data. This synthetic data can also be downloaded and used offline to perfect processes before working Blindly on real data.

This capability can be accessed in the web interface via the Create New Process page, or invoked via the SDK as shown in the new Blind_Sample example.

PMML Model Support

The Predictive Model Markup Language (PMML) standard allows defining statistical models and processes. These can now be treated as algorithmic Assets, allowing outside groups to use the model for generating predictions without exposing the data to the model owner or the model to the data owner.

The new PMML example shows using this capability to define and infer against a model. Initially, we support logistic regression models against NumPy formatted data.

Web Interface Improvements

New look for the landing page and loading indicator
New look for the Access Request page
Algorithm Asset profiles are now visible in the Asset explorer, giving the type of model as well as the shape of input and output data.
The Process History now allows clicking into a completed process to view information about the operation and parameters, retrieve results, as well as see error messages for failed processes.

Okta Login support

For organizations using Okta for user access management, you can now add TripleBlind access via an Okta tile. Contact us for more information on integrating into your environment.

Other updates

SECURITY: Raise connection requirements from SSLv23 to TLS_1.2 for all services
Enhanced preprocessor language. The preprocessor now supports all SQLite syntax for data munging.
Optimized SMPC operations to boost speed significantly, especially on slower network connections.
Revamp of the xgboost_regression example
BUGFIX: Error messages from certain failed processes could expose a single raw data element. Ex: “Failed to convert to int: ‘Bill Jones’”. Now the failed data element is stripped from the reported error.

Release 1.34.0 – December 1, 2021

Web interface: Agreements

The list of asset agreements is now more informative, with a better summary. Plus:

Rather than "Permissions: Pre-approved" and "Permissions: Required" terminology, the more intuitive and consistent "Approval: Automatic" and "Approval: Manual" are used.
Context menu on agreements, allowing copy and delete.
Algorithm agreements can be created in web interface

Algorithm Profile

Algorithm assets now have a Profile tab. Conceptually similar to a Data Profile report on a dataset, this allows viewing basic information about the given algorithm. For example, a neural network can show the expected shape of the input data, algorithms trained within TripleBlind can report model accuracy during testing, etc.

Other updates

Moved the Invite action in the web interface to the 🔗Users page.
Improved error capture in operations during operation launch.
Binary SMPC inference can now output raw predictions, in addition to results rounded to 0.0 or 1.0.
SDK’s XGBoostModel object now has predict()/predict_proba().
SDK download is now hosted directly by tripleblind.app.
Security: Removed unused component that used an older version of glibc with a CVE.

Release 1.33.0 – October 27, 2021

Web-based Calculations

The web user interface now allows you to define and run a Blind Join without any Python code with all the same privacy guarantees! The 🔗Create New Process page allows you to launch a Blind Join as an initial operation.

The Blind Join wizard walks you through picking the datasets you want to work with, selecting the match fields, and then choosing the desired outputs.

Other Web Interface Enhancements

Email Share for assets
All assets now have a Share menu item under the ⁝ utility menu in the upper right of asset cards or detail views. You can use this to easily communicate with your collaborators and bring their attention to a specific asset.
Result Download
The output from previous operations is available under the My Processes page on the History tab. Final results can be seen, and output assets (such as a trained model) can now be directly downloaded from there.
Data Profile refresh
Each data asset automatically generates an Exploratory Data Analysis (EDA) report when created. For live datasets (those connected to a database) the data profile can change after that initial analysis report. Data owners can now click a refresh button on their assets to produce a fresh report.

Under the hood and security updates

Reduced memory usage in Federated Learning.
Updated many dependencies to latest know-secure versions.
Various minor sanitization and path traversal prevention changes.
Improved validation and error handling when the shape of data presented to a neural network does not match.
Single-use password resets (previously they were valid for 24 hours).

SDK Updates

Revamped the Transfer_Learning example to use current best-practices
Revamped the local documentation included in the SDK. Reference documents are now housed online, which is pointed to by the local doc.

Release 1.32.0 – October 6, 2021

Process Manager

The web user interface now provides a better view of organization-wide processes as well as tools to manage them with the 🔗My Processes page.

Under "My Processes" are tabs showing the operations running or queued to run, a tab to show the operations awaiting permission from another user, and a tab to hold the operations that have been run previously.
A Process can be canceled while in the queue or while waiting for permission. Once it has begun, the process cannot be halted.
"Usage of My Assets" is now "Audit Usage" under the Assets

Email Notifications

Organization members can now get notification emails when items need their attention or when an action completes. Currently notifications are generated when an Access Request is pending, when a FAQ question is submitted about an owned asset, or when a submitted FAQ question is answered.

Notifications can be enabled and disabled on the Profile page, along with the email address used for the notification.

Dataset Creation in Web Interface

Enhanced the “Create New Asset” capability in the web interface in several ways:

Added support for drag-and-drop selection of data files (e.g. CSV files)
Simplified the process to occur on one page instead of a multi-step wizard
Datasets will now be created in the “Package” format, just like in the SDK

Misc Updates

Assets now pre-positioned for Random_Forest and Outlier_Detection examples
Ongoing refinements of the Access Point setup process.
Updated online tutorials to point to the install.sh/install.bat scripts in the SDK
BUGFIX: Very large training operations could have network failures or timeouts.

Release 1.31.0 – September 9, 2021

Question and Answer / FAQ

Users are now able to submit questions to asset owners. These questions can be answered by the asset owner privately or publicly as a Frequently Asked Question.

User Registration Refinements

A handful of minor changes have been made to the user invitation / registration process to make it more intuitive and foolproof. For example, error messages after an invitation link expires are more explicit.

Example revamp

The examples generally followed the sequence: create data, position data on multiple access points, perform an operation. This was a good end-to-end sequence, but it has also caused some confusion since it wasn’t always clear that the example was playing the part of multiple parties.

Several examples have been revamped to use pre-positioned datasets rather than creating their own data. This currently includes Tabular_Data and Cifar, but will be an ongoing process.

SDK Updates

Added new tb.Asset.find() and tb.TableAsset.find() methods. These cast the results appropriately (making code completion more useful). There is now also an owned_by=ORG_ID parameter which allows better specification for searches. These methods will also throw an exception if a search returns multiple values, limiting inadvertent matches of the wrong asset.
Added new tb.util.save_to("name") and tb.util.read_from("name") utility for stashing and retrieving simple values between scripts.
Added an org_id property to tb.Session objects.

Misc Updates

Add automatic heartbeat mechanism to keep connections alive for long running operations.
Better handling of datasets with missing or “NaN” values.
Security updates (upgrades to support libraries).

Release 1.30.0 – August 18, 2021

Web interface

Revamped the https://eval.tripleblind.app web interface based on user feedback. The previous navigational distinction between Algorithms and Datasets > Explore has been removed, combining them into the single Explore Assets page. This page also allows a user to filter the view to only those owned by the organization, or those assets which they have created.

Functionality has also been rearranged into Audit and Manage sections, along with styling updates. Feedback is always welcome!

Output Package format

Up to this point, the output of any given protocol has been determined by the protocol itself. Now output is always a package (.zip) file format. The package will contain the generated output, such as a trained model, along with additional metadata about the output. This also allows protocols to return additional information, such as train/test results.

Examples have been updated to reflect this change.

SDK Updates

Added non-interactive (-ni) flag to the Windows SDK installation script
Reduced load time of the tripleblind library by 75% or more
Shifted the storage of datasets used in examples into the example’s directory instead of a different data_for_examples sibling directory. This makes it a little clearer what is being used in each example.
The tb utility now has --owned and --quiet flags. Also improved error reporting in various unusual situations.

Misc Updates

Removed the ability for an organization owner to disable their own admin access
User interface speed optimizations. In particular the /api/assets (and the related /api/datasets and /api/algorithms endpoints) are significantly faster. Various unused portions of the response were trimmed in this process.
Performance optimization on API endpoints

Release 1.29.0 – July 29, 2021

BERT Sentiment Analysis

Add support for Google's 🔗BERT for sentiment analysis. This technique supports training on top of the default English language model as well as inference to generate sentiment scores from blocks of text. See the demos/Sentiment_Analysis/BERT_Fine_Tune folder.

SDK Install Scripts

To simplify the install/upgrade process for SDK users the SDK now contains install.sh (for Mac/Linux) and install.bat (for Windows). These scripts walk you through installing Anaconda if needed, creating a new ‘tripleblind’ environment, installing the dependencies and updating your user token in the tripleblind.yaml file.

Access Point Setup Wizard

A new installation flow includes a wizard and startup Docker image which validates host resources and network connectivity before launching the Access Point installation.

Bank Transaction Prediction Demo

A new demos/Bank_Transaction_Prediction illustrates collaboration between three financial institutions in a real-world scenario. The training shows munging data in a preprocessor to deal with mismatched column names and data that needs to be converted in various ways for each dataset. This demo also includes the sequence in both Python and R scripts.

Misc Updates

Release Notes are now available via the portal, including information on historical releases.
Router interface has a consistent Copy icon for when an asset ID is presented.
Asset detail views are now a tabbed interface, better supporting Mock Data and Data Profile display.
Asset editing now all occurs under the Manage screen.
Changed CsvPreprocessor to TabularPreprocessor. The older name is still supported, but deprecated.

Release 1.28.0 – June 30, 2021

Sentiment Analysis

The new TableAsset.sentiment_analysis() tool allows private classification of natural language text, producing a sentiment score. The score is a value between -1.0 and 1.0 indicating overall positivity or negativity of the text. You can see this in operation in demos/Sentiment_Analysis.

Preprocessor Data Munging

The preprocessor now supports calculation and/or filtration using the preprocessor.sql_transform() statement. This allows new fields to be generated, unit conversion and many other types of data manipulation against private datasets before being fed into a training operation. Filtration can also occur at this level. The Data_Munging example now shows how this can be used to run a linear regression against a subset of a data asset combined with another data asset holding metric instead of imperial units.

R Support Preview

The SDK includes a preview of an experimental feature, allowing the Python SDK to be used by the R programming language. The R_Support folder contains examples of scripts that can be used within R Studio to obtain a Blind Sample of a dataset, perform a private Fuzzy Search, perform Private Set Intersection, and finally to train a neural network on top of tabular data. These examples are based directly on Python examples found in the SDK with the same folder names.

Misc Updates

Add optional --max= parameter to the tb utility to limit output.
Improve reporting of errors across multiple access points
Support mixed GPU / CPU-only Access Points for training

NOTE: We have identified a problem with XGBoost operation under pip-installed environments, although we do not have a root cause yet. If you need to use XGBoost, installing the TripleBlind library via Conda using the environment.yaml works properly.

Release 1.27.0 – June 16, 2021

Enhanced Agreements User Interface

The previous release introduced SDK support for advanced, fine-grained Agreements for sharing assets with other organizations. This release adds a user interface on the Router for defining and managing your agreements. See the Manage button on an asset you own.

MongoDB Support

Added support for assets based on MongoDB databases. The new tb.asset.MongoDatabase class provides this support, as illustrated in the new demos/MongoDB example. This example demonstrates using a MongoDB-based asset to train an XGBoost model.

Improved Swagger Support

Updated the Swagger tooling to match the OpenAPI 3.0 (OAS 3.0) standard, providing more and better information for people and tools using the Swagger documentation.

Sample Data “Unmasking”

The sample data editor now has a new tool, allowing non-sensitive columns to be “unmasked” to illustrate what actual random (non-synthetic) data looks like. This is particularly useful for illustrating non-sensitive string data, showing random samples rather than nonsensical generated strings.

Data Munging

Added the Data_Munging example, which illustrates a data owner using SQL to feature engineer in several ways:

Creating a daily average from high and low values
Executing a low pass filter to produce a moving average
Reclassify entries, converting textual classes to numerics

Release 1.26.0 – May 21, 2021

Sample Data

Tabular data assets now support the ability to produce 10 records of sample data. By default the sample data is randomized, but the data owner can use the new Edit Sample button on the detail screen to access the Sample Data Editor. From there, the data owner can:

See the randomly generated sample data.
Manually edit any value in any field of the sample data.
“NULL” a sample field to better represent datasets with missing values.

Once a sample has been published, other organizations can see it in the Router’s explorer view or pull it down programmatically using the TableAsset.get_sample() method.

NOTE: This slightly changes the behavior of TableAsset.get_sample(). If you use that method, check the documentation for more details. Additionally, the tb.HEAD operation is now deprecated.

Recommendation Engine

A new federated recommendation engine allows an organization to suggest the right product or service based on historical behavioral data, all using distributed, private techniques. The new Recommendation_Model example illustrates using user movie ratings distributed across two organizations to build a model, followed by predicting the top 10 movie matches for users based on this model.

Enhanced Agreements

The Agreement system has been significantly revamped. Now it includes:

Agreements limited by dates or number of uses
Fine-grained usage Agreements between organizations tied to a specific operation – either a predefined action, like Blind Learning; or executing a custom algorithm, like a trained neural network.
Collaborative agreements which apply to current and future datasets.
Limited publication of private assets, allowing an Asset to be seen by only a specific organization but still requiring explicit usage approval.

The new agreement system is used throughout the SDK examples. Any existing usage of Asset.create_agreement() will need to transition to Asset.add_agreement(), such as this:

     asset_1.add_agreement(
          with_org=34,  # partner Organization ID
          operation=tb.Operation.RANDOM_FOREST_TRAIN,
      )

instead of the previous:

     asset_1.create_agreement(to_org=34)

EDA for live datasets

The mechanism used to generate Exploratory Data Analysis (EDA) reports now supports data assets connected to a database (e.g. an SQL view of a database).

SDK

Improved the neural network architecture used in the Cifar example.
The CSVDataset.create() method now allows explicit definition of column names for datasets with no header row. See the Blind_Sample example.
BUGFIX: In certain situations involving only one Access Point (e.g. the XGBoost_Regression example) the JWT token could cause an uncaught exception. Now it is caught and dealt with cleanly in the background.

Release 1.25.0 – April 29th, 2021

Fuzzy Match

The PSI Join capability introduced in the last release has been expanded to include support for both exact and “fuzzy” matches (using a Jaro-Winkler distance). Additionally a TableAsset.blind_join() method has been added to support both operations.

The Fuzzy_Match example now shows both exact and fuzzy matching.

Demos

A new demos directory has been added to the SDK. These are distinguished from the examples in that they focus on a specific use case rather than an illustration of a particular method or technique. Demos will often use methods that already exist in an example, but the demo will illustrate using it to solve a specific problem scenario. Each demo might have its own requirements.txt file for unique dependencies of the demo.

The first demo is a Drug_Trial. This shows using a Blind Join to allow a pharmaceutical company to peek at the results of a running drug trial without having to reveal the placebo doses to anyone at the test sites. This demo also shows how blind results can still be used to produce valuable visualizations.

Access Point manager

The user Settings page has been enhanced in several ways:

The current status of the Access Point can be gathered at a glance by the visualizations.
The Speedtest button starts an analysis of connection speed
The Details button shows information about the operation similar to the Unix top command.

Bug Fixes

Changes in release 1.24.0 caused the status displayed during Redact operations to be incorrectly formatted.
Password reset now operates off of the username rather than the associated email address. This resolved issues where the same person has accounts with multiple organizations.
Certain informational APIs could be called without authentication. While not inherently dangerous, this was changed so that all API calls require authentication (except the login APIs) for consistency.
Operations involving the same Access Point playing the role of multiple parties (typically not a real-world usage) could experience timeouts from conflicting JWT tokens.

Release 1.24.0 – April 14, 2021

PSI Join protocol

The new PSI Join allows two organizations to privately compare entries in datasets (via a common key), and then return selected information when a match is found. For example, a piece of PII (like a social security number) could be used to match entries, but then non-sensitive information could be returned from one dataset. The new PSI_Join example demonstrates this to allow a railway to share statistics with a retail store about the most popular stations among their shared customers.

User Permission Management

Access Point administrators now have finer-grained control over the capabilities of users in their organization. Administrators can use Access > Users to see permissions for each user, including:

User
- Manage: Able to manage user permissions
- Invite: Able to invite new users to join
- Remove: Able to delete existing users
Jobs
- Create: Able to start a new job / task via the SDK
Algorithm / Dataset
- Grant permission: Able to give other organizations access
- Publish: Able to list an asset on the Router’s Index
Agreement
- Manage: Able to create or change standing access agreements
Asset
- Update: Able to change the listed properties of assets

Private Docker Registry

New Access Points set up using the scripts out of the setup wizard will use a private Docker registry (registry.tripleblind.app). This allows authentication via the jwt tokens from the Router, a cleaner experience for new and upgrading users.

SDK

Running task statuses are now handled more efficiently
Failed tasks now automatically display the failure message
Feedback messages show more meaningful text during operations rather than the meaningless Communication count.
Several enhancements to the tb utility:
Search has been unified with the GUI, making results exactly the same.
BUGFIX: delete command alias was not working

Release 1.23.0 – March 25, 2021

User Interface

Beginning a transition to a new visual framework (Material UI), giving a more modern look and feel plus more mobile friendly.

Moved Profile, Logout and Settings to the upper-right corner, under the user icon
Users can now reset their access token under Settings in the Authentication Token section.
Cleared email field after inviting a user to join
BUGFIX: On a small screen, Asset IDs could overflow the containing card

Access Point Setup Process

Enhanced the wizard used when setting up an Access Point to make the process easier for non-standard installations.

SDK

Enhanced the Snowflake example to create a stage area during the setup steps and to move the credentials to a YAML file.
Added error aggregation. During multi-party operations, only errors from the user’s side were being reported. Now errors from all involved parties get aggregated and made available for debugging.
Silent operations now show a message if they are waiting for a permission grant.
New XGBoostModel.train() method, simplifying the interface to XGBoost. See the XGBoost and XGBoost_Regression examples.
New TableAsset.get_sample() method, simplifying the interface to Blind Sampling. See the Blind_Sample example.

Private Model Averaging

Blind Learning and Federated Learning will now utilize a new SMPC averaging algorithm. This means that the server is not able to see any one individual model during training which adds privacy for the client models. Along with this, the new incorporation into Blind Learning will give this protocol the ability to train models with greater accuracy when label classes are not present on all clients.

Under the Hood

Continuing some ruggedization work, the Access Point job management has been beefed up to handle unusual failed operations better.

Release 1.22.0 – March 3, 2021

Transfer Learning

The new Transfer_Learning example illustrates the process of bringing a pretrained model to the platform and augmenting it with data from another organization.

Private Queries

The new Private_Query example demonstrates how the Access Point can act as a privacy and compliance tool even for simple SQL queries. Summary reports can be easily created and exposed to outside organizations. These reports can be requested on an ad hoc basis, but the underlying data remains invisible and securely within control of the data owner.

Blind Sampling

The Blind_Sample example shows the new ability to request artificial sample data that looks like the private dataset. Values are random, but have similar characteristics to the actual data – integer values are in the same range, decimal values have the same number of places of accuracy, strings are the same general length.

TableAsset enhancements

The TableAsset allows SDK users easy access to results and tabular datasets they own. It now has several new methods to make it easier to use:

TableAsset.load() Retrieves and loads a data asset, cached in memory.
TableAsset.pretty_print() Print the contents of the table in a nice format
TableAsset.dataframe property The table as a Pandas dataframe.

Additionally, a convenient new job.result.table property makes it simple to work with tabular results from operations.

Under the Hood

This release includes quite a few “under the hood” changes that might not be noticeable, but they would be noticed if they weren’t in place. These include:

Quick EDA. A rapid Exploratory Data Analysis report is now created based off of the first record in a dataset. This has limited information to protect privacy, but does provide basic information about the number of fields and records, field names, and field types. Once that has been generated, a full EDA starts and will replace the quick report once it has completed. For extremely large datasets the full EDA might not be possible, in which case the quick report will remain with basic information.
Improved Access Point memory management
Ruggedized the status reporting mechanism

SDK

The manage_asset.py and position.py scripts have been replaced by the new tb.py all-in-one command line asset management tool. It now supports date filtering, --exclude inverse regex searches, compact and detail asset listing modes, and more. Use the --help command to see all the options.
Add Asset.create_agreement() method
Add Asset.organization, Asset.organization_id, Asset.activate_date and Asset.metadata properties.
Add tb.util.print_in_columns() utility for displaying large datasets easily and compactly to match the size of the screen.
Remove the “Table Compare” operation and example. It was extremely specialized and used unique methods that conflicted with other features, so we have retired it.

Easy Access Point Upgrade

The groundwork was laid in the 1.21.0 release for this feature. If you upgraded your access point to 1.21.0 previously, you should now be able to use the Upgrade button on your Settings page. The process will now be as easy as a button click and only take a few minutes to complete. If you have any issue with this, please 📨 let us help.

Bug Fixes

Several listed datasets had bad links to images.
Asset.download() would use the wrong default filename in certain situations.

Release 1.21.0 – February 17, 2021

Multimodal AI

The new CT_Vertical_Partition example in the SDK demonstrates multimodal training on vertically partitioned Dicom (CT scan) images and CSV data. It includes an example of distributed inference after the model is built.

Distributed Linear Regression

Added the Gene_Regression example in the SDK to demonstrate the ability to train Linear, Logistic, Ridge or Lasso regression models on data distributed across multiple owners.

Private Data Regex Search

A new private data operation allows search and summarization using regex strings. See the Table_Search example.

Web Interface

The main web interface and the developer portal now have Documentation / Router links joining them. Login has also been centralized.

Easy Access Point “Upgrade”

Add a button on the Router under the Settings page which allows administrators to upgrade their Access Points via a button push. The Health check will show when an update is available.

NOTE: You will still need to follow the manual update procedure for this release to deploy the infrastructure needed for this feature.

SDK

Status updates now include more information during the execution of a protocol than just the “communications count”. For example, during inference the name of the layer being executed is displayed.
Add support for multiple preprocessors for working with vertically partitioned data.
Add support for connecting to datasets in a BigQuery cloud warehouse.
The TableAsset now has wrappers for regex search (TableAsset.search()) and outlier detection (TableAsset.detect_outlier()).

Datasets are now created using purpose-built classes. The hierarchy of classes is shown below:

Asset
      DatasetAsset
            TabularDataset
                  DatabaseDataset
                     BigQueryDatabase
                    SnowflakeDatabase
               CSVDataset
        AlgorithmAsset
              NeuralNet

Use <type>.create() to build one of these in Python, e.g.

 asset = tb.asset.SnowflakeDatabase.create(
     os.environ["SNOWFLAKE_USERNAME"],
     os.environ["SNOWFLAKE_PASSWORD"],
     os.environ["SNOWFLAKE_ACCOUNT"],
     "my_database_name,
     query="SELECT * FROM sant_train LIMIT 500 OFFSET 200;",
     name="sant_snowflake",
     desc=f"Dataset housed in a Snowflake data warehouse.",
  )

Bug Fixes

A minimum character count on the Search box for datasets / algorithms caused confusing behavior.
The Audit log was not displaying references to used Datasets, only algorithms.
The “desc” field was being ignored when creating new assets.

Release 1.20.0 – January 29, 2021

LSTM support

Added ability to build and train LSTM models. Added the LSTM example to illustrate building a generative text model from distributed text datasets. Inference is also illustrated, using both AES and SMPC security models.

Federated Learning

Add a training protocol to support Federated Learning. We still believe Blind (Split) Learning is a superior solution in both speed and quality of the model, but it is now possible to train via Federated Learning if preferred. See the Federated_Learning example.

Redact operation

Added a new tb.REDACT operation which performs a blind redaction of columns within a tabular dataset. The redaction includes Name and Date types of text, replacing the value with Xs. See the Redact example.

Snowflake Database Assets

Added a new example of creating a database asset which points to a Snowflake database. See the Snowflake example folder.

Web Interface

Add link from home page to the Portal
Add introduction to the Examples in the Tutorial sequence
Revamp and enhance reference documentation for content and usability
BUGFIX: The “Reporting” page would sometimes show as blank, not showing the proper Audit log.
BUGFIX: EDA profiles were not generated on packaged CSV datasets

SDK

Add support for creating and managing Agreements programmatically
Support Jupyter Lab for tutorial notebooks

Release 1.19.0 – January 15, 2021

Development Portal

Overhauled tutorials. The write-ups are now done in Jupyter notebooks which are published statically in the portal and included in the SDK for interactive use.
Revamped and enhanced reference documentation for content and usability

Database View Assets

Add a new asset preprocessor package to define tabular database views. The database is specified by a connection string, and an SQL query defines the view.

Web Interface

New “Upgrade Guide” with copy/paste commands to update Access Points
New “Custom” option in Access Point setup wizard for non-standard setups
Add display of Access Point network statistics for ping and throughput in Settings
Add a filter field to the Agreements

SDK

Distinguish “tutorials” from “examples”. Examples are now grouped in the examples/ directory, and the tutorials are implemented as interactive Jupyter notebooks.
Remove find_asset.py, the manage_asset.py utility makes it redundant. When training, the generated network definitions are now considered temporary and automatically get deleted after completion. A persist=True parameter can be used with create_network() if the definition needs to be retained for some reason. This fixes the “this network already exists” annoyance some experience when running examples.
Added several new utility functions for convenience
tb.util.script_dir()
tb.util.set_scriptdir_current()
tb.util.read_run_id()
Added support for creating and managing Agreements
Added support for --verbose flag to assist in debugging SDK configuration
The Object_Detection example now downloads test data as part of the sequence
BUGFIX: Names of packaged datasets were misleading, missing the .zip extension and breaking some protocols

Release 1.18.0 – December 21, 2020

Distributed Inference Protocol

Implement the analog of Blind Learning on vertically split data, allowing AES inference against data spanning multiple organizations.
See the example Vertical_Partition, 3b_aes_inference.py

Region of Interest Protocol for Images

Implement training and AES inference protocols for image features, one or many polygons per image.
Allow local visualization of identified regions
See new Object_Detection example.

User Interface Enhancements

Organization owners can now invite others in their organization to sign up via the new Settings > Invite Users feature. This starts a self-service signup flow.
Add Job List search by: id; result; algorithm, dataset or protocol name Limit CSV dataset profiling to Pearson’s R graphs as a speed/memory optimization Development Portal Add online reference documentation, previously only in the SDK package.

SDK

Add decorrelation loss (dloss) to the Image_Data example Asset.download() now streams directly to disk, no longer requiring extra memory for large assets.

Optimizations

Reduced communications needed for SMPC inference, leading to 50% run time speed up.

Bug Fixes

Internal job token could expire during certain long-running SMPC calculations
Prevent network timeouts during extremely long-running AES inferences.
Remove non-functional “Create Agreement” button from detail view of algorithms owned by other organizations.
Correct issue which could cause services to be used from a different cloud service provider, resulting in unnecessary latency and SMPC slowdown.

Release 1.17.0 – December 19, 2020

Enhanced Large Scale Operation

Inference now supports batching and can chunk input data that is larger than memory.
In-memory limitations removed on uploads, allowing terabyte sized datasets

Packaging and Preprocessor

Add reusable packaging of CSV, Image, Dicom and raw (numpy) datasets
All training dataset packages can include labels inside the package
Deprecated PKL support, eliminating a security threat. Arbitrary data can now be represented in the serialized numpy dataset.
Auto-packaging of single files, such an image file or a CSV, when a path is given in a dataset parameter.

Speed Optimization

SMPC is 10-50% faster, depending on the latency between Access Points.

Pretrained Model Support

Neural networks, such as a Keras .H5 file, can be loaded onto the platform pretrained. See the Network_Provisioning example.

Outlier Detection operation

New tb.OUTLIER_DETECTION operator can identify data elements in CSV columns using statistical methods such as z_score. See the Outlier_Detection example.

User Interface

My Listings allows sorting by name or date and can filter by name and type
New setup wizard for Access Points
Add full-text search for developer documentation

SDK

Revamped the manage_asset.py utility. Options are more consistent, downloads include a progress bar, and confirmations were added during Archive (delete)
Support for Preprocessor and Package builder
Output is now delivered as a asset, making it easier to process, more flexible and more secure
Add binary f1 and roc_auc scores, via the binary_metric parameter.

Release 1.16.0 – October 28, 2020

Promoted from Beta to version 1