• DocumentCode
    2186092
  • Title

    Clustering Data Streams Using Mass Estimation

  • Author

    Sabau, Andrei Sorin

  • Author_Institution
    Fac. of Math. & Comput. Sci., Univ. of Pitesti, Pitesti, Romania
  • fYear
    2013
  • fDate
    23-26 Sept. 2013
  • Firstpage
    289
  • Lastpage
    295
  • Abstract
    The explosive growth of data generation, storage and analysis within the last decade has led to extensive research towards stream mining algorithms. The existing stream clustering literature contains both adaptation of classical methods as well as novel ones trying to address space and time scalability issues arising from dealing with high volume, high velocity information assets. This paper presents MaStream, a novel stream clustering algorithm experiencing constant space complexity and average case sub-linear time complexity. The algorithm makes use of mass estimation as an alternative to density estimation without employing any distance measure making it highly adaptable to both low and high dimensional data streams. Employing an evolving ensemble of h:d-Trees, the algorithm identifies arbitrary shaped clusters while handling both noise and outliers without a priori information such as total number of clusters. Experimental results over a series of both synthetic and real datasets illustrate the algorithm performance.
  • Keywords
    computational complexity; data analysis; data mining; pattern clustering; trees (mathematics); MaStream; Mass Estimation; constant space complexity; data analysis; data generation; data storage; density estimation; h:d-trees; novel data stream clustering algorithm; stream mining algorithms; sub-linear time complexity; Algorithm design and analysis; Clustering algorithms; Data mining; Data models; Estimation; Partitioning algorithms; Vegetation; clustering ensemble; mass-based clustering; stream clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2013 15th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-1-4799-3035-7
  • Type

    conf

  • DOI
    10.1109/SYNASC.2013.45
  • Filename
    6821162