DocumentCode :
28364
Title :
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Author :
Radovanovic, Milos ; Nanopoulos, Alexandros ; Ivanovic, Mirjana
Author_Institution :
Fac. of Sci., Univ. of Novi Sad, Novi Sad, Serbia
Volume :
27
Issue :
5
fYear :
2015
fDate :
May 1 2015
Firstpage :
1369
Lastpage :
1382
Abstract :
Outlier detection in high-dimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points´ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in k-NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic k-NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.
Keywords :
data analysis; pattern classification; unsupervised learning; data dimensionality; distance concentration; k-NN method; reverse nearest neighbor; unsupervised outlier detection; Context; Correlation; Educational institutions; Euclidean distance; Histograms; Noise measurement; Standards; Outlier detection; distance concentration; high-dimensional data; reverse nearest neighbors;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2365790
Filename :
6948273
Link To Document :
بازگشت