• DocumentCode
    3143473
  • Title

    Locality Sensitive Outlier Detection: A ranking driven approach

  • Author

    Wang, Ye ; Parthasarathy, Srinivasan ; Tatikonda, Shirish

  • Author_Institution
    Comput. Sci. & Eng. Dept., Ohio State Univ., Columbus, OH, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    410
  • Lastpage
    421
  • Abstract
    Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distance-based outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distance-based outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.
  • Keywords
    data handling; file organisation; distance-based outlier detection; distance-based outlier discovery process; light-weight ranking scheme; locality sensitive hashing; locality sensitive outlier detection; ranking driven approach; Algorithm design and analysis; Approximation algorithms; Artificial neural networks; Clustering algorithms; Databases; Nearest neighbor searches; Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767852
  • Filename
    5767852