# Regression

A regression is a statistical technique that relates a dependent (or target) variable to one or more independent (or explanatory) variables. This is a useful first step when trying to understand the relationships among your available features.

Train regression models across both horizontally and vertically partitioned datasets. Both Federated Inference and SMPC Inference are supported.

### Operation

When using `add_agreement()`

to forge an agreement on a trained model, use `Operation.EXECUTE`

for the `operation`

parameter.

When using `add_agreement()`

to allow a counterparty to use your dataset for model training use the appropriate regression `operation`

parameter below.

### Horizontally-Partitioned Regression

Use `Operation.REGRESSION`

in Agreements for distributed and private training of Linear, Logistic, Lasso, or Ridge Regression over individual or horizontally-partitioned datasets. All model types are supported for both Federated Inference and SMPC Inference.

For training, use `RegressionModel.train()`

. For inference, use `ModelAsset.infer()`

.

#### Training Parameters

`datasets: Union[Asset, List[Asset]]`

- A dataset
`Asset`

(or a list of horizontally partitioned`Assets`

)

`regression_type: tripleblind.RegressionType`

- Accepts
`RegressionType.LINEAR`

,`RegressionType.LOGISTIC`

,`RegressionType.LASSO`

, or`RegressionType.RIDGE`

.

`target: str = ""`

- The name of the target column for the training.

`model_params: Optional[dict]`

- Accepts parameters specified for each scikit-learn equivalent:
- đź”—sklearn.linear_model.LinearRegression
- đź”—sklearn.linear_model.LogisticRegression
- đź”—sklearn.linear_model.Lasso
- đź”—sklearn.linear_model.Ridge

`preprocessor:â€‚Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]`

- The preprocessor(s) to use against the datasets. When no preprocessor is specified, the default preprocessor selects all columns.

`job_name:â€‚Optional[str]`

- Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

`test_size: Optional[float] = 0.0`

`data_scale_factor: Optional[int] = 0`

`weight_scale_factor: Optional[int] = 0`

`minimum_data_value: Optional[int] = 0`

`minimum_weight_value: Optional[int] = 0`

### Vertically-Partitioned Regression

#### Vertical Datasets

Use `Operation.PSI_VERTICAL_REGRESSION_TRAIN`

for Agreements when youâ€™d like to identify an overlap of matching records across datasets, and train a linear or logistic regression model on the vertically-partitioned intersection.

For training, use `PSIVerticalRegressionModel.train()`

. For inference, use `ModelAsset.psi_infer()`

.

#### Training Parameters

`datasets: Union[Asset, List[Asset]]`

- A list of vertically-partitioned
`Assets`

, starting with the initiatorâ€™s dataset.

`match_column: Union[str, List[str]]`

- Name of the column to match on when identifying the vertically-partitioned intersection. If not the same in all datasets, this is a list of column names, starting with the initiatorâ€™s dataset and corresponding to the order of
`Assets`

provided in the`datasets`

parameter. - If a single fieldname is provided, each dataset must have the same name for that
`match_column`

, eg. â€śIDâ€ť.

`regression_type: tripleblind.RegressionType`

- Accepts
`RegressionType.LINEAR`

or`RegressionType.LOGISTIC`

.

`target: str = ""`

- The name of the target column for the training.

`preprocessor:â€‚Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]`

- The preprocessor(s) to use against the datasets. When no preprocessor is specified, the default preprocessor selects all columns.

`learning_rate: Optional[float] = 0.1`

`job_name:â€‚Optional[str]`

- Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

### PSI Regression using a Decision Tree

See Decision Tree documentation.

### Predictive Model Markup Language (PMML)

Predictive Model Markup Language (PMML) is the leading standard for statistical and data mining models and is supported by over 30 vendors and organizations. With PMML, it is easy to develop a model on one system using one application and deploy the model on another system using another application, simply by transmitting an XML configuration file.

In general, TripleBlind will be a consumer of PMML models as a means for a user to load a model into the TripleBlind system (i.e., create a model asset owned by the user).

TripleBlind has added support for đź”—Predictive Model Markup Language (PMML) models to perform inference with both Federated and SMPC security. Currently, a subset of the full PMML specification is supported, including đź”—Tree Models and đź”—General Regression.

Notes on General Regression:

- This model type supports a much larger array of models, typically referred to as generalized linear regression models.
- We support Linear & Logistic Regression (see PMML example in SDK with
`default.xml`

model file) with multiple covariates, interaction terms, and polynomial terms.

#### Limitations

- PSI is a powerful operation and can only be used when the initiator owns at least one of the datasets in the computation. This restriction makes the private set intersection completely privacy-preserving, as data is not moved between parties. Only membership information of the identifiers is revealed.
- When using PSI-based operations, the owned dataset must be supplied as the first (or left-side) dataset asset.