netneurotools.stats.get_mad_outliers

netneurotools.stats.get_mad_outliers(data, thresh=3.5)[source]

Determine which samples in data are outliers.

Uses the Median Absolute Deviation for determining whether datapoints are outliers

Parameters:

data ((N, M) array_like) – Data array where N is samples and M is features
thresh (float, optional) – Modified z-score. Observations with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers. Default: 3.5

Returns:

outliers – Boolean array where True indicates an outlier

Return type:

(N,) numpy.ndarray

Notes

Taken directly from https://stackoverflow.com/a/22357811

References

Boris Iglewicz and David Hoaglin (1993), “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

Examples

>>> from netneurotools import stats

Create array with three samples of four features each:

>>> X = np.array([[0, 5, 10, 15], [1, 4, 11, 16], [100, 100, 100, 100]])
>>> X
array([[  0,   5,  10,  15],
       [  1,   4,  11,  16],
       [100, 100, 100, 100]])

Determine which sample(s) is outlier:

>>> outliers = stats.get_mad_outliers(X)
>>> outliers
array([False, False,  True])