Outlier Detection

Identify records in which values deviate from the expected distribution. This occurs in a privacy-preserving manner, never exposing the content of the dataset in the process, and outputs only record identifiers of the outliers, not raw data.

This operation privately analyzes the given dataset values. Each value is then compared to the mean value for the dataset, identifying rows where the value is outside the number of standard deviations given. This is known as the Z Score. For a normal distribution, 99.7% of the values will be within 3 standard deviations of the mean.

Operation

Parameters

  • outlier_algorithm: str = ""
  • outlier_params: dict = defaultdict
  • identifier_column: str = ""

Limitations

  • Currently only the only outlier_algorithm supported is z_score.
  • Rows which have values outside of the absolute value of the z-score larger than the specified std parameter will be returned.