Regression

A regression is a statistical technique that relates a dependent (or target) variable to one or more independent (or explanatory) variables. This is a useful first step when trying to understand the relationships among your available features.

Train regression models across both horizontally and vertically partitioned datasets. Both Federated Inference and SMPC Inference are supported.

Operation

When using add_agreement() to forge an agreement on a trained model, use Operation.EXECUTE for the operation parameter.

When using add_agreement() to allow a counterparty to use your dataset for model training use the appropriate regression operation parameter below.

Horizontally-Partitioned Regression

Use Operation.REGRESSION in Agreements for distributed and private training of Linear, Logistic, Lasso, or Ridge Regression over individual or horizontally-partitioned datasets. All model types are supported for both Federated Inference and SMPC Inference.

For training, use RegressionModel.train(). For inference, use ModelAsset.infer().

Training Parameters

datasets: Union[Asset, List[Asset]]

  • A dataset Asset (or a list of horizontally partitioned Assets)

regression_type: tripleblind.RegressionType

  • Accepts RegressionType.LINEAR, RegressionType.LOGISTIC, RegressionType.LASSO, or RegressionType.RIDGE.

target: str = ""

  • The name of the target column for the training.

model_params: Optional[dict]

preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]

  • The preprocessor(s) to use against the datasets. When no preprocessor is specified, the default preprocessor selects all columns.

job_name: Optional[str]

  • Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

test_size: Optional[float] = 0.0

data_scale_factor: Optional[int] = 0

weight_scale_factor: Optional[int] = 0

minimum_data_value: Optional[int] = 0

minimum_weight_value: Optional[int] = 0

Vertically-Partitioned Regression

Vertical Datasets

Use Operation.PSI_VERTICAL_REGRESSION_TRAIN for Agreements when you’d like to identify an overlap of matching records across datasets, and train a linear or logistic regression model on the vertically-partitioned intersection.

For training, use PSIVerticalRegressionModel.train(). For inference, use ModelAsset.psi_infer().

Training Parameters

datasets: Union[Asset, List[Asset]]

  • A list of vertically-partitioned Assets, starting with the initiator’s dataset.

match_column: Union[str, List[str]]

  • Name of the column to match on when identifying the vertically-partitioned intersection. If not the same in all datasets, this is a list of column names, starting with the initiator’s dataset and corresponding to the order of Assets provided in the datasets parameter.
  • If a single fieldname is provided, each dataset must have the same name for that match_column, eg. “ID”.

regression_type: tripleblind.RegressionType

  • Accepts RegressionType.LINEAR or RegressionType.LOGISTIC.

target: str = ""

  • The name of the target column for the training.

preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]

  • The preprocessor(s) to use against the datasets. When no preprocessor is specified, the default preprocessor selects all columns.

learning_rate: Optional[float] = 0.1

job_name: Optional[str]

  • Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

PSI Regression using a Decision Tree

See Decision Tree documentation.

Predictive Model Markup Language (PMML)

Predictive Model Markup Language (PMML) is the leading standard for statistical and data mining models and is supported by over 30 vendors and organizations. With PMML, it is easy to develop a model on one system using one application and deploy the model on another system using another application, simply by transmitting an XML configuration file.

In general, TripleBlind will be a consumer of PMML models as a means for a user to load a model into the TripleBlind system (i.e., create a model asset owned by the user).

TripleBlind has added support for đź”—Predictive Model Markup Language (PMML) models to perform inference with both Federated and SMPC security. Currently, a subset of the full PMML specification is supported, including đź”—Tree Models and đź”—General Regression.

Notes on General Regression:

  • This model type supports a much larger array of models, typically referred to as generalized linear regression models.
  • We support Linear & Logistic Regression (see PMML example in SDK with default.xml model file) with multiple covariates, interaction terms, and polynomial terms.

Limitations

  • PSI is a powerful operation and can only be used when the initiator owns at least one of the datasets in the computation. This restriction makes the private set intersection completely privacy-preserving, as data is not moved between parties. Only membership information of the identifiers is revealed.
  • When using PSI-based operations, the owned dataset must be supplied as the first (or left-side) dataset asset.