Back to glossary

Outlier Detection

Outlier detection finds the records that don't fit the rest of the dataset — abnormal revenue, weird email patterns, sudden spikes — flagging them for review or special handling.

What is outlier detection?

Outlier detection identifies data points that deviate significantly from the rest. In GTM data, outliers can be informative (a $10M deal in a $50K-average dataset is real signal) or polluting (a misformatted revenue field that landed as 1000× the actual value).

Why it matters

  • Outliers skew averages and break ML/AI models trained on dirty data
  • Some outliers are your best customers; others are data-entry mistakes
  • Catching them early prevents downstream cascade errors

Detection techniques

  • Statistical: z-score, IQR, modified Z-score
  • Distance-based: k-NN, DBSCAN
  • Model-based: isolation forest, autoencoder reconstruction error
  • Rule-based: values outside business-defined ranges

What to do with outliers

  • Tag don't delete — they're often informative
  • Review thresholds quarterly
  • For ML training, exclude or winsorize; for business reporting, surface them

How TexAu helps

Use AI Column to flag rows whose values look anomalous given the surrounding data — quick way to surface data-quality bugs and high-value edge cases simultaneously.

Related