Statistics#

Feature Store provides functionality to compute statistics for feature groups and datasets, and then persists them including the metadata. These statistics can help you derive insights about the data quality. These statistical metrics are computed during materialisation time and persisting with other metadata.

Note

Feature Store uses MLM Insights, which is a Python API that helps evaluate and monitor data for entire ML observability lifecycle. It performs data summarization, which reduces a dataset into a set of descriptive statistics.

The statistical metrics that are computed by feature store depend on the feature type.

Numerical Metrics

Categorical Metrics

Skewness

Count

StandardDeviation

TopKFrequentElements

Min

TypeMetric

IsConstantFeature

DuplicateCount

IQR

Mode

Range

DistinctCount

ProbabilityDistribution

Variance

FrequencyDistribution

Count

Max

DistinctCount

Sum

IsQuasiConstantFeature

Quartiles

Mean

Kurtosis

Drift Monitoring#

Models can fail silently. Over and over we see the root cause of model issues in production can be traced back to the data itself, not the model. By applying data monitoring to the feature store, practitioners can automatically catch data issues like missing values, change in data format or unexpected values (change in data cardinality), and data drift upstream before the models are impacted

_images/drift_monitoring.png