Blind Join
Blind Join builds on Blind Match but provides powerful additional functions:
- Identify the subset of data within 3rd-party datasets that match to features within your own dataset, and bring in additional feature information for that subset of the 3rd party’s data.
- Perform “fuzzy matching” (using Jaro-Winkler distance) on identifiers that may differ slightly (e.g., name or address).
Blind Join is a Safe with Care operation (see Privacy Assurances and Risk in the Getting Started section of the User Guide), and has the potential for misuse. TripleBlind has a number of safeguards for its use:
- Blind Join is not permitted to return any columns the Asset Owner has masked; the assumption being that masked columns contain PII/PHI or otherwise sensitive information.
- Blind Join is disabled by default at our strictest security levels.
- Unless an Agreement has been established permitting auto-approval of requests, all Blind Join operations require an informed Asset Owner approval through an Access Request.
k-Grouping
is respected in the Blind Join operation as a minimum record threshold on the output; a join that would result in fewer thank
records would automatically fail with a warning message.
Operation
- Use the
blind_join()
method to join your dataset with one or more dataset assets. - When using
add_agreement()
to allow a counterparty to use your dataset for a Blind Join, useOperation.BLIND_JOIN
for theoperation
parameter. Permissive agreements (without usage restrictions) are not recommended for use with this operation.
⚠️Blind Join is NOT recommended for HIPAA applications.
Parameters
intersect_with: Union[Asset, List[Asset]]
- A dataset
Asset
(or a list ofAssets
) to be included in the Blind Join.
match_column: Union[str, List[str]]
- Name of the column to match. If not the same in all datasets, a list of the matching column names, starting with the initiator asset and then listing a name in each
intersect_with
dataset. - If a single fieldname is provided, each dataset must have the same name for that
match_column
, eg. “ID”.
match_fuzziness: Optional[float] = 0.0
- If specified, the fuzziness threshold for match values. This is a value between 0.0 and 1.0. Default is 0.0 (perfect match).
return_columns: Union[str, List[str], List[List[str]]]
- Name of the columns to be returned in the resultant dataset. This is a list of lists of column names such that each party has its own list of column names (starting, as with
match_column
, with the initiator asset). str:
If a single column name is provided, columns matching that name across all datasets will be returned. eg.["a"]
is interpreted as[["a"], ["a"], ["a"]]
List[str]:
If a list of names is provided, columns matching these names across all datasets will be returned. eg.["a", "b"]
is interpreted as[["a", "b"], ["a", "b"], ["a", "b"]]
List[List[str]]:
If a list of lists is provided, only columns matching the names from the respective dataset will be returned. eg.[["a", "b"], ["c", "d"]]
is interpreted as[["a", "b"], ["c", "d"]]
- If a column name is specified for an asset and it does not exist, it is ignored. If multiple datasets have the same name for a column, the output dataset will contain suffixes to differentiate the source of each field. e.g. If "name" is in multiple datasets, the return will contain "name", "name__1" & "name__2".
join_type: Optional[JoinType]
- Describes the format of the resulting dataset table. Default is
INNER
. INNER:
This is a SQL-like inner join where each party's return columns are included in the result set.INNER_PARTITIONED:
This is an inner join with each party's matching rows appended to the end of the dataset. There is no alignment by matching column. This format provides more privacy-protection and reveals less group-membership information than a standard inner join.LEFT:
This is a SQL-like left join where ALL records are returned from the initiator's table, with matching records from each party's return columns (where available).
job_name: Optional[str]
- Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.
preproc: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]
- The preprocessor(s) to use against the datasets. When no preproc is specified, the default preproc selects all columns.
silent: Optional[bool]
- Suppress status messages during execution? Default is to show messages.
Limitations
- A Blind Join using fuzzy matching is only supported for two parties.
- Blind Join is a powerful operation and can only be used when the initiator owns at least one of the datasets in the computation.
- When using Blind Join, the owned dataset must be supplied as the first (or left-side) dataset asset.
SELECT *
is not permitted insql_transform
preprocessors upstream of Blind Query and Blind Join. Useget_column_names()
to request a list of column names and list them explicitly in the query. This empowers the data owner to make an informed approval decision within any Access Requests against their dataset.