Release Notes

This page documents the notable changes from each new version of the TripleBlind Access Point and SDK. The most recent release is always at the top.


Release 1.52.0 – September 20, 2023

Web-based Blind Stats

The web user interface now allows you to run a Blind Statistics analysis without any Python code with all the same privacy guarantees! The 🔗Create New Process page allows you to launch a Blind Stat. The Blind Stats wizard walks you through picking the dataset(s) you want to work with, then select the statistical operations you want to perform.

The operation still requires permission to access assets, either through Agreements or manual permission grants. Once the process completes, the results can be downloaded directly from your Access Point via a link in the Router interface, again retaining the same level of privacy guarantees offered by Python scripts.

Web Interface performance refinements
  • The functionality of Audit pages has been honed and refined, improving responsiveness.
  • The Q&A pages have been refined, improving responsiveness.
Other changes and bug fixes
  • Optimized the Swagger documentation generator.
  • The Sentiment Analysis, Redact and Knowledge Graph examples and protocols have been retired.
  • The SDK train() and predict() methods now raise exceptions when the operation fails instead of returning None.
  • BUGFIX: Filtering by owner in the web interface would ignore keyword search terms, instead displaying all owned assets.
  • BUGFIX: Parsing of the tb asset retrieve command would ignore any specified “Save As” name.
  • BUGFIX: Using the Stop functionality with a Jupyter Notebook cell while it is waiting on a remote action to complete would generate infinite error messages and eventually crash the browser.
  • SECURITY: Switched to Safetensor instead of Pickle in the image preprocessor serializer.
  • SECURITY: Resolved several potential security issues.


Release 1.51.0 – August 21, 2023

Python Preprocessor

Tabular data preprocessing can now be performed using Python code, in addition to the existing SQL technique. A script using the tripleblind library can call the new TabularPreprocessorBuilder.python_transform() method, pointing to the transform function to apply to a Pandas dataframe. Any Pandas or NumPy method can be used, but all other libraries are blocked. As with SQL transforms, these preprocessing steps are visible to the data owner for inspection before approving usage.

The Python code can be provided as either a string or, more conveniently, a filename containing the transform code. For example:

preprocess_b = (
tb.TabularPreprocessor.builder()
.
all_columns(True)
.
dtype("float32")
.
python_transform("python_transform_script.py")
)


The python_transform_script.py must contain a transform(df: pd.DataFrame) function and return a Pandas DataFrame which is used in the given operation. See the example/Data_Munging/2b_model_train.py script for further reference.

To support developing these scripts, the two new tb preproc create [asset] and tb preproc test [script.py] [data.csv] utility commands are available.

Web Interface refinements
  • Assets can now be deleted from the web interface.
  • The “FAQ” functionality is now known as “Q&A”.
  • Many small visual tweaks for consistency and usability.
Other
  • All tabular results now include a column header line.
  • Record identifiers can now be requested with inference output. This can be particularly helpful in validation of new models against ground truth.
  • Performance has been improved during some protocol evaluations.
  • Improvements to the Windows SDK installer to better account for the potential variety of host machines.
  • BUGFIX: Preprocessor command preview was only displayed to the first data owner when a single preprocessor was shared by all assets.
  • SECURITY: Alpha-numeric (plus underscore) tabular column names are now fully enforced.
  • SECURITY: Addressed various CVEs by updating dependencies.


Release 1.50.0 – July 6, 2023

Inference Identifier Columns

When running an inference the default behavior is to just deliver the result, ensuring identifying information isn’t leaked. However when testing inference correctness during model development, it is often necessary to tie the output back to the data via an identifier. Now the inference operation allows the invoker to request “identifier_columns” as part of the result output when working with tabular data. This allows the data owner to correlate the output back to the data they own.

NOTE: The inference result asset now includes column headers. If you open this as a dataframe, the Pandas header autodetection often fails and reads the first line as data instead of a header row. You can correct this by either using pandas.read_csv(FILENAME, header=0) or calling result.load(header=True) against the TripleBlind result asset.

Consistent SDK tb.Operation usage

An early design decision resulted in the same operation using independent identifiers when creating an agreement and when used to start a process. In most cases these were at least similar, although a few differences could be confusing. Now all SDK code uses tb.Operation.NAME exclusively. This will require a change to some existing scripts that directly call create_job() such as:

job = tb.create_job(operation=tb.RECOMMENDER_TRAIN, params={...})

is now:

job = tb.create_job(operation=tb.Operation.RECOMMENDER_TRAIN, params={...})

For most scripts, simply adding the “.Operation” is all that is needed. However there are a few exceptions:

  • tb.REVEAL / tb.Operation.PRIVATE_QUERY are now tb.Operation.BLIND_QUERY
  • tb.ROI_DETECTOR_TRAIN is now tb.Operation.REGION_OF_INTEREST_TRAIN
  • tb.PSI_JOIN is now tb.Operation.BLIND_JOIN
  • tb.HEAD is now tb.Operation.BLIND_SAMPLE
SDK
  • Added “mask” and “unmask” commands to the command line utility, for example: tb unmask "my dataset" "field1" "field2"
  • Improved responsiveness of tb request access request management.
  • The tb.ModelAsset.find() method now returns a ModelAsset object, not a generic Asset.
  • Improved efficiency of progress messages shown during the execution of an operation. It had been needlessly querying the Router multiple times a second.
Other
  • Various small tweaks to make the interface more intuitive and consistent.
  • BUGFIX: Statistical operations now auto-coerce data to float for all operations except “count”. Previously, a value such as “NaN” in a dataset could make the column be categorized as a string, preventing calculation of things like “mean” on the whole column.
  • SECURITY: Adjusted timing of response to avoid phishing attacks looking for emails with user accounts.
  • SECURITY: Addressed various CVEs by updating dependencies.


Release 1.49.0 – June 1, 2023

Enhanced CSV Import

The “comma separated value” (aka CSV) file is a workhorse in data science. Spreadsheets can be saved to this format, most tools can import and export them, and they are easy to examine manually. However their flexibility can be problematic with missing header lines, duplicate field names, field names that can’t be used by some tools, or corrupt data due to blank rows. The web interface and the SDK now provide more tools to make it easy to identify and fix problems like this.

The web UI now provides a preview of the import, allows custom separators, and catches invalid files with varying record sizes. The CSVDataset.position() method in the SDK also performs validations and provides tools to automatically rename invalid or duplicate field names.

Streaming Inference Enhancements

The ability to produce a continuous stream of inferences at regular intervals was introduced in release 1.47. This has been enhanced in several ways:

  • Both parties involved in the operation (data and model provider) can now view the inference stream. See the SDK demo/Continuous_Inference/4c_stream_updates.py example.
  • The inference interval adjusts to account for any delay when acquiring data and for the inference calculation itself.
Model Factory

The SDK now has a ModelFactory for creating custom neural networks using standard architectures. While they can also be defined manually, the definition can be verbose with repeating layers, also making it easy to create subtle bugs. The ModelFactory.vgg() method can create a Visual Geometry Group style network with a single line of code. See the Cifar example’s 2_train.py step for details.

Web Interface
  • Granular email notification settings. See the “Notifications” tab under My Settings, found in the upper right user menu.
  • Performance optimizations, part of an ongoing effort to improve the user experience.
  • Blind Report parameters now appear in the order used at creation time rather than alphabetical order by parameter name.
  • BUGFIX: Audit reports could not be downloaded in some situations.
  • BUGFIX: Declined access requests could get stuck as “In Progress”.
SDK
  • Error messages are more verbose, part of an ongoing effort to improve the user experience.
  • Update some tutorial notebooks to use current best practices.
  • BUGFIX: Allow killing of “zombie” processes. These occur when an Access Point restarts in the middle of a running process and similar situations.


Release 1.48.0 – April 19, 2023

Preprocessor Data Cleaning and Normalization tools

Working with data which you cannot see can be tricky. The preprocessor now provides several ways to cleanse data using the TabularPreprocessor.handle_nan() method. With it, you can set missing or Not a Number numeric values to a fixed value; to a statistical value such as the max, min, mean, or median; or you can simply “drop” the row containing the missing value.

The TabularPreprocessor.replace(before, after) method allows replacing any value in the dataset before it gets used in an operation. This can be combined with the handle_nan() method to replace a value with a statistical property or drop it altogether.

# Drop any row containing missing values in any column
pre = tb.TabularPreprocessor.builder().all_columns(True).handle_nan("drop")

# Fill any missing value in column "A" with the mean of the values found in
# that column, and fill missing values in column "B" with the number 42.
pre = (
   tb.TabularPreprocessor.builder()
   .all_columns(True)
   .handle_nan({"A": "mean", "B": 42})
)

# Replace the value -1 in column "A" with NaN.  The handle_nan() method could
# be used to convert that into a replacement like the mean.
# Also change the value "M" to "male" in column "B".
pre = (
   tb.TabularPreprocessor.builder()
   .all_columns(True)
   .replace({"A": (-1, np.nan), "B": ("M", "male")})
)
Process Management

Users now have much better control over processes they have created. Administrators can also be granted the ability to control others processes with the new Process > Manage permission.

Within the web interface Active Process list, you can now Cancel any running or queued operation. This is also possible from the command-line interface using the tb process cancel command, or optionally by using Ctrl+C in an SDK operation which is waiting on the process to complete.

Additionally, the new tb process connect command allows you to reconnect and view any running process. This can be useful with long running operations such as model training. The SDK provides access to the process output stream via the StatusOutputStream object obtained from Job.get_status_stream() or returned by invoking Model.infer(stream_output=True).

Learning Rate Scheduler

Blind Learning and Federated Learning protocols can now define a learning rate scheduler to vary how quickly the model learns as the training progresses. You can specify either the CyclicLR or Cyclic Cosine Decay scheduler. See example/Cifar in the SDK for an example of how this works. If not defined, learning occurs at a constant rate.

Data positioning wizards for Azure Data Lake / Blob Storage, Amazon S3, and Amazon Redshift

The web user interface can now be used to create assets that reference data stored in several popular cloud databases, as well as simple CSV datasets. These are accessed via the New Asset button in the upper right when browsing assets.

Settings and Credential Management

User settings have been reorganized under the My Settings dialog found by clicking your name/avatar in the upper right corner. In addition to the general reorganization, you can now retrieve a pre-populated tripleblind.yaml when setting up your development environment or resetting your access tokens by visiting the Authentication tab and clicking Download tripleblind.yaml.

Web Interface
  • Revamp of the Agreements page to clearly show incoming and outgoing agreements for access to assets.
  • Support scientific notation for Blind Report parameters.
  • SECURITY: Email login is now case-insensitive.
  • BUGFIX: Blind Report would sometimes show links to operations which were not Blind Report runs.
  • BUGFIX: The CSV export of an Audit was not always working.


Release 1.47.0 – March 15, 2023

K-Means Clustering

K-Means Clustering is an algorithm to partition data into k non-overlapping subgroups (aka clusters) where each data point fits into only one of these subgroups. The algorithm attempts to partition so that the data in any single cluster is as similar as possible. During the process, the actual content of the data is not exposed to the other party. In the end only the cluster centers (means) are revealed along with an inertia value which indicates how well the clusters fit the raw data, plus information allowing the trainer to match their training input with the appropriate cluster classification. This can be performed on distributed datasets owned by two separate parties, with an optional private set intersection used to find matching data. Finally, inference can be performed with the trained model for predicting the cluster new data would fit into.

Real-time Streaming Inference

The federated inference protocol now supports streaming inferences on a regular interval. This is a powerful combination when paired with a live database asset, enabling real-time predictions based on live data.

Blind Report in web interface

Blind Reports can now be run directly from the web interface. When a Report asset is visited, users can set the desired parameters and launch a report. The “Your Reports” section shows results of previous runs.

Reports are still defined using the SDK, and now more error checking is performed to prevent creating invalid reports.

Web Interface
  • BUGFIX: Mock data search bar and column rename wasn’t functional
  • Administrators can now initiate a password reset for other users
  • Add link to the Learning System from Help
  • Pages have better titles in browser history
  • Notifications are now turned off for new users by default
  • Tooltip explanations for permissions
SDK
  • BUGFIX: Blind Joins and Blind Reports can successfully return an empty result if no overlap or output is generated. Previously it ended with an error code.
  • A new set of Regression Asset classes simplify training and inferring against a variety of regression model types.


Release 1.46.0 – February 1, 2023

Parameterized Blind Report

Predefined database-backed reports can now be defined, allowing variable but secure queries to be run safely on even the most sensitive data. For example, a report could be defined that allows a user to select a year and month along with a company division for generating salary statistics by ethnicity or gender for usage in compliance reporting. Any number of variables and any complexity of queries are supported. See the examples/Blind_Report for documentation and more information.

PSI + Vertical Regression

Regression training and inference can now be performed on vertically partitioned, distributed datasets. Additionally, PSI can be applied to allow operation only against common records, without revealing that overlap in the process. See the examples/Vertical_Regression and examples/PSI_Vertical_Regression for more information.

PSI + Vertical Decision Trees

Decision trees can now be trained on vertically partitioned, distributed datasets. This allows training models between organizations with unknown numbers of overlapping customers or records, without revealing the overlap in the process. See the examples/PSI_Vertical_Decision_Tree for documentation and more information.

Other
  • Changed login and password reset to operate using the user’s email address instead of a username.
  • Add tb ps cancel command for canceling queued processes from the command line.
  • The new optional unmask_columns= parameter can be used to unmask while positioning tabular datasets.
  • Add warehouse= parameter for Snowflake assets to select a Snowflake Compute Warehouse.
  • Various web interface refinements to the navigation menu and asset details pages.
Fixes
  • Security: Prevent an attacker from re-queuing a previously approved operation in certain situations.
  • Security: Prevent a potential password reset replay attack.
  • Security: Prevent certain combinations of preprocessor operations in Blind Join which could be potentially exploited and have no practical value.
  • Security: Highlight SQL_Transform preprocessor steps during Access Request approvals for additional scrutiny.


Release 1.45.0 – December 20, 2022

Decision Trees for Vertically Partitioned Data

Vertically partitioned data is when multiple organizations have different pieces of information on the same individual. For example an individual with general demographic information in one dataset, medical history in another dataset, and pharmacy records in a third dataset. With a common identifier, such as a social security number, datasets owned by different organizations can remain in place yet still be used as training data for regression or classification decision trees. See the examples/Vertical_Decision_Tree in the SDK.

Blind Join Refinement

The Blind Join protocol has been enhanced to better match expectations for those familiar with various types of SQL join operations. Now a user can choose from Left or Inner (traditional or Partitioned) join operations. See examples/Blind_Join and the blind_join documentation for more information.

Data Connectors
  • New Azure Blob Storage data connector.
  • Add chunking support for MongoDB to support larger-than-memory datasets.
  • Rename the examples/Database_Connectors to the more appropriate Data_Connectors.
SDK Ease of Use

In order to simplify data management and keep examples focused, several changes are being made:

  • Input assets have been renamed (typically beginning with “EXAMPLE - “) and the use of run_id.txt discontinued.
  • The new data_requirements.yaml file points to the source of data used for the example, typically under the examples/Assets directory.
  • Most asset-related error checking is eliminated when tb.initialize(example=True) is used.

This revamp has been done for around half of the examples and will be completed in the next release.

As part of this ease-of-use effort, parameters to protocols are now validated before usage. This allows for immediate and clear error messaging when a parameter is incorrect (such as an unknown or misspelled loss function during training) or a required parameter is missing.

Web Interface
  • Organization owners can now require or disable the usage of 2FA for their users.
  • Revamped the presentation and owner management of data assets.
Other
  • Expand ONNX support with 20 new operators.
  • Support custom SSL trust chains, defined in system settings.
  • Split Enterprise Mode endpoints on the Access Point to a unique port, allowing for distinct security settings for internal users.
Fixes
  • Performing accept/deny of an access request via the tb requests command line utility worked correctly, but gave incorrect feedback.
  • Fixed the inference script in the LSTM example.
  • Adjusted the timeout on the Redshift connector to make it more reliable.
  • Blind Statistics could fail on data containing NaN (not a number) values.


Release 1.44.0 – December 5, 2022

Bookmarks

Users can now create and curate their own set of bookmarked Assets. These can be kept in a general set of bookmarks, or under a named collection. Clicking the icon in the upper-right of an asset view brings up a dialog to add or manage the bookmark. The new My Bookmarks page shows all you have marked.

Permission Assignments at Invitation Time

Administrators can now assign permissions while inviting the user. Previously the user had to accept the invitation first, requiring the administrator to grant permissions after this acceptance before the new user could do anything.



Asset Audit and Browsing Performance Improvements

Several optimizations were added to make response times for both the web interface and SDK operations more snappy. Additionally, the default date range for audits now covers the last seven days.

Fixes
  • Portal searches to “preprocessor” and “tripleblind” module reference documentation was broken
  • Missing asset message improved
  • Improved messaging when information is requested from an Access Point that is offline
  • Email notifications now point to the support website instead of an email address
  • Added “last seen” to user details in administrative view
  • Exact searches from the tb utility could potentially return non-matching assets.
  • Asset details no longer available to users without visibility, even with the correct URL link
  • Updated several dependencies to latest know-secure versions.


Release 1.43.0 – November 2, 2022

Audit Asset overhaul

The 🔗Audit Asset Usage page has been significantly expanded. Three types of reports are now easily accessible on the tabs:

  • Activity Report - detailed list of all individual operations involving assets owned by you
  • Weekly Report - a summary of asset usage, showing daily and weekly total access counts
  • Usage by User - summary of users who have utilized your assets during a given time range
Data connectors

New data connectors make it easy to connect your data for usage within TripleBlind. Now you can connect an asset to:

  • Amazon Redshift
  • Amazon S3 Bucket
  • Azure Data Lake Storage

Additionally, examples of all connectors have been gathered into one location in the SDK. See the examples/Data_Connector subdirectory.

NOTE: The Snowflake example has been folded into this location.

Modeling Capabilities
  • Predictive Model Markup Language (PMML) support has expanded to include randomForest tree definitions.
  • Add SMPC GlobalAveragePool ONNX operator.
Asset Explorer facelift

The 🔗Explore Assets browser is easier to use than ever. A new search and filter bar docks at the top of the window as you scroll through assets, making it easy to refine. Additionally, the asset cards have been enhanced with clearer styling and information about the user who positioned it for owned assets.

SDK

A variety of changes have been made to make the SDK easier to use.

  • New method TableAsset.get_column_names()
  • Parameter validation
  • Better error messages.
  • Chunked uploads when positioning data
Fixes
  • Admin panel user search is now case insensitive
  • Fix incorrect conv_layer_A and conv_layer_B outputs when strides != [1, 1].


Release 1.42.0 – October 5, 2022

Two-Factor Authentication

Two factor authentication (2FA) is now available under each user's 🔗My Account page in the Security Settings section. Once enabled, the user will be walked through the process to set up 2FA.

Web User interface

A handful of small but impactful changes have been made to the web interface:

  • The documentation Portal is now available to all! Links to documentation can be freely shared with colleagues.
  • The preprocessor steps are now reflected in the Asset Request for operations which utilize them.
  • Mock Data now supports generating mock Age, Sex, Date, and Time values.
  • Process creation now allows going backwards in the setup steps.
  • Links and emails now point to the new TripeBlind support site.
Installers
  • Mac/Linux: Previous installs of Anaconda would break the TripleBlind installer due to conflicts between Anaconda and the Mamba solver. Now a warning is shown and the slower Conda solver is used if Anaconda is detected.
  • Mac (M1): The installer was giving a false failure message.
SDK
  • The blind_join() method now accepts a custom preprocessor, which will override the default.
  • When the `allow_overwrite` param is set during asset positioning, an existing private asset of the same name will now be archived.
Security
  • Tighten security settings for session id and CSRF token cookie.
  • Web logins now expire after 48 hours. Users will be prompted to login again after expiration.
  • Closed a potential XSS vulnerability which could allow Javascript code to be executed if their JSON packet was viewed in a browser (unusual, but possible).


Release 1.41.0 – August 31, 2022

ONNX Support

The Open Neural Network Exchange format (.onnx) is now supported for positioning neural networks. This format has the most complete set of features and is now the preferred format for working with TripleBlind, although basic PyTorch (.pth) and Keras (.h5) model support will remain. See the Pretrained_NN_Inference example (replacing the older Network_Provisioning example) to see how to work with all three model types.

Additionally, a new network compatibility tool has been added to the tb utility. It can be invoked using tb validate [DIRECTORY | FILENAME] and will flag networks which use unsupported layers or features.

Access Requests

The Access Request page now allows viewing of full job detail, including parameters included on requests. This gives data owners the information needed to fully understand what is being proposed as a process with their dataset.

Additionally, Audit logs now record denied requests.

Knowledge Graph Random Forest

A new and powerful network training technique has been added to the toolkit. Knowledge Graphs can represent complex interactions between disparate data nodes. An example of this technique in action to train and use a network to detect money laundering can be seen in the Knowledge_Graphs example.

SDK

The new tb.ModelAsset class makes using a trained neural network or statistical model as easy as a single line. Example usage:

import tripleblind as tb

# Look for a model Asset ID from a previous training
trained_network_id = tb.util.load_from("model_asset_id.out")
trained_network = tb.ModelAsset(trained_network_id)

result = trained_network.infer(
   "test.csv",
   preprocessor=tb.DocumentPreprocessor.builder().document_column("TEXT"),
)
print("Inference Results:")
print(result.table.dataframe)
Security

Several new security features have been implemented this release:

  • Password strength meter and complexity requirements
  • Resolved CVEs in several common libraries


Release 1.40.0 – August 3, 2022

Python 3.9 Update

The default SDK environment has now been updated to Python 3.9.13, along with appropriate updates to other support libraries. In addition to the new syntax and base library features for the Pythonistas out there, this also improves the security posture as Python 3.7 approaches its end of life.

Mock Data Editor

Mock Data is now customizable by data owners directly from the asset profile screen in a WYSIWYG (what you see is what you get) way rather than the old method under Asset > Manage > Mock Data.

PSI extensions

Several improvements have been made to the PSI functionality:

  • Independent preprocessors can be specified for each input dataset in a PSI Join.
  • Multiple match columns can be specified for PSI Vertical operations.
  • Improve PSI memory usage efficiency when working with massive datasets.
Security Updates
  • HTTP Strict Transport Security (HSTS) is now enforced in response headers
  • Use HMAC instead of Hash for access point challenge response
  • Employ Server-Side Request Forgery (SSRF) protection for access point ping from the Router
Other updates
  • Add the ‘sha256()’ method for use in sql_transform() operations.
  • Masking settings are now enforced in Blind Query operations, preventing preprocessor and other potentially misleading requests for data use from being submitted.
  • Blind Query now honors the k-Grouping setting, rejecting output that doesn’t cross the threshold.
  • BUGFIX: `tb --version` would crash
  • BUGFIX: Agreement search in the GUI gave inconsistent results
  • BUGFIX: Asset Explorer search in the GUI gave inconsistent results
  • BUGFIX: Correct grammar in registration/password reset emails.


Release 1.39.0 – June 29, 2022

Distributed Regression

Training a regression model on distributed datasets is now supported via the tb.TableAsset.fit_regression() method. Multiple datasets can be assembled into a set of single virtual records via a private set intersection (PSI) on a unique key value found in all of the datasets. The resultant fitted model can also be used to make predictions against distributed datasets. See the PSI_Regression example.

Dataset k-Grouping Setting

All data assets now have an associated “k-Grouping” setting. Based on the concepts of k-anonymity, this value is used in aggregation methods to make sure no group smaller than the setting is reported in aggregations. This helps reduce probing for information about individuals based on outside knowledge of some characteristic of that individual, commonly referred to as a linkage attack. Initially this applies to the statistics methods in Blind Stats.

Access Point Enterprise Mode

The TripleBlind network architecture is very flexible, but enterprise security architecture sometimes calls for less flexibility. An Access Point can now be configured for enterprise mode which results in two things:

  1. API calls made by SDK scripts will route through the Access Point rather than calling the Router’s API endpoints directly. This reduces the necessary exception in external firewalls to the single port 443 route to/from the Access Point.
  2. Users must be provisioned at the Access Point, providing a tight “multiple-signature” access control that is entirely in the hands of the enterprise IT department which hosts the Access Point.

Learn more about this in the Enterprise Access Point Setup and in the Access Point Administration guides.

Security Enhancements
  • Authentication tokens expire after 6 months. All existing auth tokens will need to be regenerated by visiting the 🔗My Account page.
  • Authentication tokens can no longer be shown, only refreshed.
SDK enhancements
  • New tb.TableAsset.mask_columns() and tb.TableAsset.unmask_columns() methods.
  • New tb.Requests.accept_all() and tb.Requests.deny_all() methods.
  • SDK network calls are more resilient to slow and unreliable network connections.
  • Improved SDK support for Windows and M1 based Macs.
Other updates
  • The technique used to securely average models during blind learning has been significantly improved, decreasing time for each average by 15x.
  • Output from the Blind Sample is no longer “re-masked” when viewed in the Router interface.
  • The dedicated Median operation has been deprecated, and the Median example in the SDK has been removed. This functionality is now available as part of Blind Stats, along with various other ordered and descriptive statistics (see the Statistics example in the SDK).


Release 1.38.0 – May 26, 2022

Statistics Support

The term “order statistics” refers to the class of basic statistical operations which require ordering the data. This includes many common operations, such as min, median and max. This is now possible with distributed datasets without sharing any of the data amongst the participants, retaining perfect privacy. The full list of statistics supported are:

  • Min
  • Max
  • Median
  • Quartiles
  • Mean
  • Variance
  • Standard deviation
  • Skew
  • Kurtosis
  • Count
  • Confidence interval
  • Standard error

Additionally, stratification of samples is supported through a grouping function. A type of k-anonymity guarantee is included, which blocks calculations that involve fewer records than the k value (currently hard-coded to 5).

Mock Data Editor

The ability to control how Mock Data appears has been greatly enhanced with the new Mock Data Editor. A sensitive field can now be marked as masked (hiding the content of the raw data behind a “mask” whenever it is made visible), and those masks can be customized. For example a field could be assigned a Full Address mask so a very realistic looking address such as “250 Brooks Radial Suite 868 Meyerview, NE 95054” would be displayed. Several dozen of these masks exist, allowing for very realistic example and sample data. You can see this in the Blind_Sample SDK example and the Create New Process > Blind Sample.

Preprocessors

Several new preprocessor mechanisms are being developed and are available in this release as Early Access. Currently, these wrap Scikit-Learn transformers and address typical preprocessing needs: One-Hot Encoding, Ordinal Encoding, and Binning. Each is wrapped using OneHotEncoder, OrdinalEncoder, and KBinsDiscretizer respectively. See th