• DocumentCode
    1126113
  • Title

    Linear-time wrappers to identify atypical points: two subset generation methods

  • Author

    Hashemi, Saeed

  • Author_Institution
    Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
  • Volume
    17
  • Issue
    9
  • fYear
    2005
  • Firstpage
    1289
  • Lastpage
    1297
  • Abstract
    The wrapper approach to identify atypical examples can be preferable to the filter approach (which may not be consistent with the classifier in use), but its running time is prohibitive. The fastest available wrappers are quadratic in the number of examples, which is far too expensive for atypical detection. The algorithm presented in this paper is a linear-time wrapper that is roughly 75 times faster than the quadratic wrappers on average over 7 classifiers and 20 data sets tested in this research. Also, two subset generation, methods for the wrapper are introduced and compared. Atypical points are defined in this paper as the misclassified points that the proposed algorithm (Atypical Sequential Removing: ASR) finds not useful to the classification task. They may include outliers as well as overlapping samples. ASR can identify and rank atypical points in the whole data set without damaging the prediction accuracy. It is general enough that classifiers without reject option can use it. Experiments on benchmark data sets and different classifiers show promising results and confirm that this wrapper method has some advantages and can be used for atypical detection.
  • Keywords
    computational complexity; learning (artificial intelligence); pattern classification; ASR; Atypical Sequential Removing; atypical point identification; classification task; linear-time wrappers; outlier detection; overlapping samples; subset generation method; Accuracy; Automatic speech recognition; Credit cards; Filters; Frequency selective surfaces; Gene expression; Insurance; Intrusion detection; Performance evaluation; Testing; Index Terms- Atypical data; linear wrapper; outlier detection; overlapping samples; sample subset selection.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.150
  • Filename
    1490534