• DocumentCode
    3255971
  • Title

    Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters

  • Author

    De Andrade Silva, Jonathan ; Hruschka, Eduardo Raul

  • Author_Institution
    Univ. of Sao Paulo (USP) at Sao Carlos, Sao Carlos, Brazil
  • Volume
    2
  • fYear
    2011
  • fDate
    18-21 Dec. 2011
  • Firstpage
    14
  • Lastpage
    19
  • Abstract
    Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
  • Keywords
    data handling; pattern clustering; CluStream algorithm; Stream KM++ algorithm; Stream LSearch algorithm; bisecting k-means algorithm; data partitioning; data stream clustering; evolving data stream; k-means-based algorithm; ordered multiple runs of k-means algorithm; variable cluster number; Approximation algorithms; Clustering algorithms; Heuristic algorithms; Indexes; Machine learning algorithms; Partitioning algorithms; Prototypes; Clustering; Data Stream; Online Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    978-1-4577-2134-2
  • Type

    conf

  • DOI
    10.1109/ICMLA.2011.67
  • Filename
    6147041