• DocumentCode
    3089804
  • Title

    An algorithm for mining outliers in categorical data through ranking

  • Author

    Suri, N.N.R.R. ; Murty, M. Narasimha ; Athithan, G.

  • Author_Institution
    Centre for AI & Robot., Bangalore, India
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    247
  • Lastpage
    252
  • Abstract
    The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
  • Keywords
    computational complexity; data mining; pattern clustering; attribute value frequencies; clustering structure; computational complexity; data clustering; data mining; independent ranking schemes; outlier detection; outlier mining algorithm; public domain categorical data sets; two-phase algorithm; Algorithm design and analysis; Benchmark testing; Clustering algorithms; Computational complexity; Data mining; Greedy algorithms; Roads; Categorical data; Data clustering; Data mining; Outlier detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
  • Conference_Location
    Pune
  • Print_ISBN
    978-1-4673-5114-0
  • Type

    conf

  • DOI
    10.1109/HIS.2012.6421342
  • Filename
    6421342