DocumentCode :
2793867
Title :
Data outlier detection using the Chebyshev theorem
Author :
Amidan, Brett G. ; Ferryman, Thomas A. ; Cooley, Scott K.
Author_Institution :
Battelle-Pacific Northwest Div., Richland, WA
fYear :
2005
fDate :
5-12 March 2005
Firstpage :
3814
Lastpage :
3819
Abstract :
During data collection and analysis, it is often necessary to identify and possibly remove outliers that exist. An objective method for identifying outliers to be removed is critical. Many automated outlier detection methods are available. However, many are limited by assumptions of a distribution or require upper and lower predefined boundaries in which the data should exist. If there is a known distribution for the data, then using that distribution can aid in finding outliers. Often, a distribution is not known, or the experimenter does not want to make an assumption about a certain distribution. Also, enough information may not exist about a set of data to be able to determine reliable upper and lower boundaries. For these cases, an outlier detection method, using the empirical data and based upon Chebyshev´s inequality, was formed. This method allows for detection of multiple outliers, not just one at a time. This method also assumes that the data are independent measurements and that a relatively small percentage of outliers are contained in the data. Chebyshev´s inequality gives a bound of what percentage of the data falls outside of k standard deviations from the mean. This calculation holds no assumptions about the distribution of the data. If the data are known to be unimodal without a known distribution, then the method can be improved by using the unimodal Chebyshev inequality. The Chebyshev outlier detection method uses the Chebyshev inequality to calculate upper and lower outlier detection limits. Data values that are not within the range of the upper and lower limits would be considered data outliers. Outliers could be due to erroneous data or could indicate that the data are correct but highly unusual. This algorithm does not ascertain the reason for the outlier; it identifies potential outlier data, allowing for domain experts to investigate the cause
Keywords :
Chebyshev approximation; data analysis; statistical distributions; Chebyshev theorem; automated outlier detection; data analysis; data collection; data outlier detection; empirical data; unimodal Chebyshev inequality; Biographies; Calibration; Chebyshev approximation; Data analysis; Electric breakdown; Humans; Instruments;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference, 2005 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
0-7803-8870-4
Type :
conf
DOI :
10.1109/AERO.2005.1559688
Filename :
1559688
Link To Document :
بازگشت