Outlier Detection
Outlier detection finds the records that don't fit the rest of the dataset — abnormal revenue, weird email patterns, sudden spikes — flagging them for review or special handling.
What is outlier detection?
Outlier detection identifies data points that deviate significantly from the rest. In GTM data, outliers can be informative (a $10M deal in a $50K-average dataset is real signal) or polluting (a misformatted revenue field that landed as 1000× the actual value).
Why it matters
- Outliers skew averages and break ML/AI models trained on dirty data
- Some outliers are your best customers; others are data-entry mistakes
- Catching them early prevents downstream cascade errors
Detection techniques
- Statistical: z-score, IQR, modified Z-score
- Distance-based: k-NN, DBSCAN
- Model-based: isolation forest, autoencoder reconstruction error
- Rule-based: values outside business-defined ranges
What to do with outliers
- Tag don't delete — they're often informative
- Review thresholds quarterly
- For ML training, exclude or winsorize; for business reporting, surface them
How TexAu helps
Use AI Column to flag rows whose values look anomalous given the surrounding data — quick way to surface data-quality bugs and high-value edge cases simultaneously.
Related