Blind Stats

Blind Stats is a powerful privacy-preserving operation that allows a dataset user to understand a study population across multiple datasets, even when the data is in different organizations or regions.

Blind Stats is a Safe operation (see Privacy Assurances and Risk in the Getting Started section of the User Guide), and has the potential for misuse. TripleBlind has a number of safeguards for its use:

  • Unless an Agreement has been established permitting auto-approval of requests, all Blind Stats operations require an informed Asset Owner approval through an Access Request. The Access Request for Blind Stats contains information on all requested statistics.
  • Requests are automatically rejected for Blind Stats operations when they would return descriptive information on groups of records that do not meet the minimum k-Grouping limits set on the involved datasets.

Operation

  • Use the get_statistics() method to query the dataset for descriptive statistics.
  • When using add_agreement() to permit a counterparty to obtain descriptive statistics for your dataset asset, use Operation.STATS for the operation parameter.
  • Although the calculation of these statistics is done in a secure and private manner, be sure that the information that is returned (such as minimums, maximums, and quartiles) is acceptable to be shared before creating a permissive agreement with a counterparty.

Parameters

column: List[str]

  • Name of data column(s) upon which to calculate.

function: List[StatFunc] # Function(s) to calculate. If not specified, all stats are calculated.

  • StatFunc.CONFIDENCE_INTERVAL # 95% CI, labeled ‘ci-lower’, ‘ci-upper’
  • StatFunc.COUNT # number of items in group, labeled ‘n’
  • StatFunc.KURTOSIS # labeled 'kurt'
  • StatFunc.MAXIMUM # labeled 'max'
  • StatFunc.MINIMUM # labeled 'min'
  • StatFunc.MEAN # labeled 'mean'
  • StatFunc.MEDIAN # labeled 'median'
  • StatFunc.QUARTILES # labeled ‘q1’, ‘mean’, ‘q3’
  • StatFunc.SKEW # labeled ‘skew’
  • StatFunc.STANDARD_DEVIATION # labeled ‘sd’
  • StatFunc.STANDARD_ERROR # labeled ‘se’
  • StatFunc.VARIANCE # labeled ‘var’

combine_with: Optional[Union[Asset, List[Asset]]] = None

  • Other table(s) with the same data/columns to virtually combine for the calculation.
  • The combination of datasets using combine_with is a horizontal union/concatenation (not like a join).
  • One dataset may be supplied by leaving combine_with out of the process call.

group_by: Optional[str] = None

  • Data column for grouping data before the calculation.
  • Supports stratification on a single grouping column.

preproc: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]

  • The preprocessor(s) to use against datasets. If a list is given, the order must be the same as the combine_with assuming the first entry is this TableAsset.

job_name: Optional[str]

  • Reference name for the job which performs this task.

silent: Optional[bool] = False

  • Suppress status messages during execution? Default is to show messages.

session: Optional[Session]

  • A connection session. If not specified, the default session is used.

Limitations

  • The group_by parameter supports a single grouping column.
  • When the supplied datasets have values missing and appear as NaNs or Nulls, the row is dropped before entering the multi-party computation. We recommend preprocessing datasets to handle these values upstream of the operation.