• DocumentCode
    2002442
  • Title

    Density-biased clustering based on reservoir sampling

  • Author

    Kerdprasop, Kittisak ; Kerdprasop, Nittaya ; Sattayatham, Pairote

  • Author_Institution
    Data Eng. & Knowledge Discovery Res. Unit, Suranaree Univ. of Technol., Thailand
  • fYear
    2005
  • fDate
    22-26 Aug. 2005
  • Firstpage
    1122
  • Lastpage
    1126
  • Abstract
    Clustering is a task of grouping data based on similarity. A popular k-means algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called weighted k-means to improve the clustering scalability. To speed up the clustering process, we develop the reservoir-biased sampling as an efficient data reduction technique since it performs a single scan over a data set. Our algorithm has been designed to group data of mixture models. We present an experimental evaluation of the proposed method.
  • Keywords
    data reduction; pattern clustering; sampling methods; very large databases; data grouping; data reduction technique; density-biased clustering; reservoir-biased sampling; weighted k-means algorithm; Clustering algorithms; Councils; Data engineering; Databases; Iterative algorithms; Knowledge engineering; Partitioning algorithms; Reservoirs; Sampling methods; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-2424-9
  • Type

    conf

  • DOI
    10.1109/DEXA.2005.72
  • Filename
    1508425