• DocumentCode
    760022
  • Title

    Density-based multiscale data condensation

  • Author

    Mitra, Pabitra ; Murthy, C.A. ; Pal, Sankar K.

  • Author_Institution
    Machine Intelligence Unit, Indian Stat. Inst., Calcutta, India
  • Volume
    24
  • Issue
    6
  • fYear
    2002
  • fDate
    6/1/2002 12:00:00 AM
  • Firstpage
    734
  • Lastpage
    747
  • Abstract
    A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity
  • Keywords
    data mining; data reduction; pattern recognition; very large databases; classification; clustering; condensation ratio; data mining; density-based multiscale data condensation; estimation error; experimental studies; instance learning; nonparametric data reduction scheme; pattern recognition; rule generation; sample complexity; very large data set; Clustering algorithms; Data mining; Databases; Density measurement; Estimation error; Iterative algorithms; Nearest neighbor searches; Pattern recognition; Sampling methods; Vector quantization;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2002.1008381
  • Filename
    1008381