Blind Join

Blind Join builds on Blind Match but provides powerful additional functions:

  • Identify the subset of data within 3rd-party datasets that match to features within your own dataset, and bring in additional feature information for that subset of the 3rd party’s data.
  • Perform “fuzzy matching” (using Jaro-Winkler distance) on identifiers that may differ slightly (e.g., name or address).

Blind Join is a Safe with Care operation (see Privacy Assurances and Risk in the Getting Started section of the User Guide), and has the potential for misuse. TripleBlind has a number of safeguards for its use:

  • Blind Join is not permitted to return any columns the Asset Owner has masked; the assumption being that masked columns contain PII/PHI or otherwise sensitive information.
  • Blind Join is disabled by default at our strictest security levels.
  • Unless an Agreement has been established permitting auto-approval of requests, all Blind Join operations require an informed Asset Owner approval through an Access Request.
  • k-Grouping is respected in the Blind Join operation as a minimum record threshold on the output; a join that would result in fewer than k records would automatically fail with a warning message.

Operation

  • Use the blind_join() method to join your dataset with one or more dataset assets.
  • When using add_agreement() to allow a counterparty to use your dataset for a Blind Join, use Operation.BLIND_JOIN for the operation parameter. Permissive agreements (without usage restrictions) are not recommended for use with this operation.

⚠️Blind Join is NOT recommended for HIPAA applications.

Parameters

intersect_with: Union[Asset, List[Asset]]

  • A dataset Asset (or a list of Assets) to be included in the Blind Join.

match_column: Union[str, List[str]]

  • Name of the column to match. If not the same in all datasets, a list of the matching column names, starting with the initiator asset and then listing a name in each intersect_with dataset.
  • If a single fieldname is provided, each dataset must have the same name for that match_column, eg. “ID”.

match_fuzziness: Optional[float] = 0.0

  • If specified, the fuzziness threshold for match values. This is a value between 0.0 and 1.0. Default is 0.0 (perfect match).

return_columns: Union[str, List[str], List[List[str]]]

  • Name of the columns to be returned in the resultant dataset. This is a list of lists of column names such that each party has its own list of column names (starting, as with match_column, with the initiator asset).
  • str: If a single column name is provided, columns matching that name across all datasets will be returned. eg. ["a"] is interpreted as [["a"], ["a"], ["a"]]
  • List[str]: If a list of names is provided, columns matching these names across all datasets will be returned. eg. ["a", "b"] is interpreted as [["a", "b"], ["a", "b"], ["a", "b"]]
  • List[List[str]]: If a list of lists is provided, only columns matching the names from the respective dataset will be returned. eg. [["a", "b"], ["c", "d"]] is interpreted as [["a", "b"], ["c", "d"]]
  • If a column name is specified for an asset and it does not exist, it is ignored. If multiple datasets have the same name for a column, the output dataset will contain suffixes to differentiate the source of each field. e.g. If "name" is in multiple datasets, the return will contain "name", "name__1" & "name__2".

join_type: Optional[JoinType]

  • Describes the format of the resulting dataset table. Default is INNER.
  • INNER: This is a SQL-like inner join where each party's return columns are included in the result set.
  • INNER_PARTITIONED: This is an inner join with each party's matching rows appended to the end of the dataset. There is no alignment by matching column. This format provides more privacy-protection and reveals less group-membership information than a standard inner join.
  • LEFT: This is a SQL-like left join where ALL records are returned from the initiator's table, with matching records from each party's return columns (where available).

job_name: Optional[str]

  • Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

preproc: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]

  • The preprocessor(s) to use against the datasets. When no preproc is specified, the default preproc selects all columns.

silent: Optional[bool]

  • Suppress status messages during execution? Default is to show messages.

Limitations

  • A Blind Join using fuzzy matching is only supported for two parties.
  • Blind Join is a powerful operation and can only be used when the initiator owns at least one of the datasets in the computation.
  • When using Blind Join, the owned dataset must be supplied as the first (or left-side) dataset asset.
  • SELECT * is not permitted in sql_transform preprocessors upstream of Blind Query and Blind Join. Use get_column_names() to request a list of column names and list them explicitly in the query. This empowers the data owner to make an informed approval decision within any Access Requests against their dataset.