Outlier Detection
Identify records in which values deviate from the expected distribution. This occurs in a privacy-preserving manner, never exposing the content of the dataset in the process, and outputs only record identifiers of the outliers, not raw data.
This operation privately analyzes the given dataset values. Each value is then compared to the mean value for the dataset, identifying rows where the value is outside the number of standard deviations given. This is known as the Z Score. For a normal distribution, 99.7% of the values will be within 3 standard deviations of the mean.
Operation
- Use the
detect_outlier()
method. - When using
add_agreement()
to allow a counterparty to use your dataset, or usingcreate_job()
to perform outlier detection, useOperation.OUTLIER_DETECTION
.
Parameters
outlier_algorithm: str = ""
outlier_params: dict = defaultdict
identifier_column: str = ""
Limitations
- Currently only the only
outlier_algorithm
supported isz_score
. - Rows which have values outside of the absolute value of the
z-score
larger than the specifiedstd
parameter will be returned.