Statistics#

Feature Store provides functionality to compute statistics for feature groups and datasets, and then persists them including the metadata. These statistics can help you derive insights about the data quality. These statistical metrics are computed during materialisation time and persisting with other metadata.

Note

Feature Store uses MLM Insights, which is a Python API that helps evaluate and monitor data for entire ML observability lifecycle. It performs data summarization, which reduces a dataset into a set of descriptive statistics.

The statistical metrics that are computed by feature store depend on the feature type.

Numerical Metrics	Categorical Metrics
Skewness	Count
StandardDeviation	TopKFrequentElements
Min	TypeMetric
IsConstantFeature	DuplicateCount
IQR	Mode
Range	DistinctCount
ProbabilityDistribution
Variance
FrequencyDistribution
Count
Max
DistinctCount
Sum
IsQuasiConstantFeature
Quartiles
Mean
Kurtosis

Drift Monitoring#

Models can fail silently. Over and over we see the root cause of model issues in production can be traced back to the data itself, not the model. By applying data monitoring to the feature store, practitioners can automatically catch data issues like missing values, change in data format or unexpected values (change in data cardinality), and data drift upstream before the models are impacted