• DocumentCode
    2771789
  • Title

    On K-Means Cluster Preservation Using Quantization Schemes

  • Author

    Turaga, Deepak S. ; Vlachos, Michail ; Verscheure, Olivier

  • Author_Institution
    IBM T.J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    533
  • Lastpage
    542
  • Abstract
    This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets.
  • Keywords
    data compression; data mining; pattern clustering; time series; compression methodology; global cluster structure; k-means cluster preservation; post-clustering quantization; time-series; Artificial intelligence; Clustering algorithms; Costs; Data analysis; Data mining; Laboratories; Partitioning algorithms; Quantization; Shape; USA Councils; clustering preservation; moment preserving quantization; privacy preservation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.12
  • Filename
    5360279