• DocumentCode
    3249348
  • Title

    O-Cluster: scalable clustering of large high dimensional data sets

  • Author

    Milenova, Boriana L. ; Campos, Marcos M.

  • Author_Institution
    Oracle Data Min. Technol., Burlington, MA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    290
  • Lastpage
    297
  • Abstract
    Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster\´s excellent scalability.
  • Keywords
    computational complexity; data mining; pattern clustering; very large databases; O-Cluster; active sampling technique; axis-parallel partitioning strategy; complexity; data handling; data mining; large high dimensional data sets; limited memory buffer; multidimensional data sets; scalability; scalable clustering; Clustering algorithms; Computational complexity; Data mining; Information retrieval; Multidimensional systems; Multimedia databases; Partitioning algorithms; Sampling methods; Scalability; Shape;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183915
  • Filename
    1183915