Outlier Detection

Outlier detection finds the records that don't fit the rest of the dataset — abnormal revenue, weird email patterns, sudden spikes — flagging them for review or special handling.

What is outlier detection?

Outlier detection identifies data points that deviate significantly from the rest. In GTM data, outliers can be informative (a $10M deal in a $50K-average dataset is real signal) or polluting (a misformatted revenue field that landed as 1000× the actual value).

Why it matters

Outliers skew averages and break ML/AI models trained on dirty data
Some outliers are your best customers; others are data-entry mistakes
Catching them early prevents downstream cascade errors

Detection techniques

Statistical: z-score, IQR, modified Z-score
Distance-based: k-NN, DBSCAN
Model-based: isolation forest, autoencoder reconstruction error
Rule-based: values outside business-defined ranges

What to do with outliers

Tag don't delete — they're often informative
Review thresholds quarterly
For ML training, exclude or winsorize; for business reporting, surface them

How TexAu helps

Use AI Column to flag rows whose values look anomalous given the surrounding data — quick way to surface data-quality bugs and high-value edge cases simultaneously.