netneurotools.stats.get_mad_outliers
- netneurotools.stats.get_mad_outliers(data, thresh=3.5)[source]
Determine which samples in data are outliers.
Uses the Median Absolute Deviation for determining whether datapoints are outliers
- Parameters:
data ((N, M) array_like) – Data array where N is samples and M is features
thresh (float, optional) – Modified z-score. Observations with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers. Default: 3.5
- Returns:
outliers – Boolean array where True indicates an outlier
- Return type:
(N,) numpy.ndarray
Notes
Taken directly from https://stackoverflow.com/a/22357811
References
Boris Iglewicz and David Hoaglin (1993), “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.
Examples
>>> from netneurotools import stats
Create array with three samples of four features each:
>>> X = np.array([[0, 5, 10, 15], [1, 4, 11, 16], [100, 100, 100, 100]]) >>> X array([[ 0, 5, 10, 15], [ 1, 4, 11, 16], [100, 100, 100, 100]])
Determine which sample(s) is outlier:
>>> outliers = stats.get_mad_outliers(X) >>> outliers array([False, False, True])