• DocumentCode
    773388
  • Title

    Distance-based detection and prediction of outliers

  • Author

    Angiulli, Fabrizio ; Basta, Stefano ; Pizzuti, Clara

  • Author_Institution
    Calabria Univ., Italy
  • Volume
    18
  • Issue
    2
  • fYear
    2006
  • Firstpage
    145
  • Lastpage
    160
  • Abstract
    A distance-based outlier detection method that finds the top outliers in an unlabeled data set and provides a subset of it, called outlier detection solving set, that can be used to predict the outlierness of new unseen objects, is proposed. The solving set includes a sufficient number of points that permits the detection of the top outliers by considering only a subset of all the pairwise distances from the data set. The properties of the solving set are investigated, and algorithms for computing it, with subquadratic time requirements, are proposed. Experiments on synthetic and real data sets to evaluate the effectiveness of the approach are presented. A scaling analysis of the solving set size is performed, and the false positive rate, that is, the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is shown to be negligible. Finally, to investigate the accuracy in separating outliers from inliers, ROC analysis of the method is accomplished. Results obtained show that using the solving set instead of the data set guarantees a comparable quality of the prediction, but at a lower computational cost.
  • Keywords
    data analysis; data mining; pattern classification; sensitivity analysis; ROC analysis; data mining; distance-based outlier detection; distance-based outlier prediction; outlier detection solving set; Computational efficiency; Data mining; Insurance; Intrusion detection; Medical diagnosis; Nearest neighbor searches; Object detection; Performance analysis; Predictive models; Weight measurement; Index Terms- Distance-based outliers; data mining.; outlier detection; outlier prediction;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2006.29
  • Filename
    1563979