• DocumentCode
    3228650
  • Title

    Clustering algorithm based on optimal intervals division for high-dimension data streams

  • Author

    Li, Yinzhao ; Ren, Jiadong ; Hu, Changzheng ; Xu, Lina

  • Author_Institution
    Lab. of Comput. Network Denfense Technol., Beijing Inst. of Technol., Beijing, China
  • fYear
    2009
  • fDate
    25-28 July 2009
  • Firstpage
    783
  • Lastpage
    787
  • Abstract
    Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a high-density unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.
  • Keywords
    pattern clustering; tree data structures; DOIC; FP-tree structure; HDU; clustering algorithm; dynamic optimal intervals-based cluster algorithm; high-density grid; high-density unit tree; high-dimensional data stream; memory-based data partition; optimal intervals division; optimal segmentation section technology; Clustering algorithms; Computer networks; Computer science; Computer science education; Educational institutions; Educational technology; Information science; Partitioning algorithms; Shape; Space technology; Clustering; Data stream; High-dimension; Intervals division;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education, 2009. ICCSE '09. 4th International Conference on
  • Conference_Location
    Nanning
  • Print_ISBN
    978-1-4244-3520-3
  • Electronic_ISBN
    978-1-4244-3521-0
  • Type

    conf

  • DOI
    10.1109/ICCSE.2009.5228155
  • Filename
    5228155