• DocumentCode
    28364
  • Title

    Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

  • Author

    Radovanovic, Milos ; Nanopoulos, Alexandros ; Ivanovic, Mirjana

  • Author_Institution
    Fac. of Sci., Univ. of Novi Sad, Novi Sad, Serbia
  • Volume
    27
  • Issue
    5
  • fYear
    2015
  • fDate
    May 1 2015
  • Firstpage
    1369
  • Lastpage
    1382
  • Abstract
    Outlier detection in high-dimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points´ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in k-NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic k-NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.
  • Keywords
    data analysis; pattern classification; unsupervised learning; data dimensionality; distance concentration; k-NN method; reverse nearest neighbor; unsupervised outlier detection; Context; Correlation; Educational institutions; Euclidean distance; Histograms; Noise measurement; Standards; Outlier detection; distance concentration; high-dimensional data; reverse nearest neighbors;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2365790
  • Filename
    6948273