• DocumentCode
    782417
  • Title

    ClusterTree: integration of cluster representation and nearest-neighbor search for large data sets with high dimensions

  • Author

    Yu, Dantong ; Zhang, Aidong

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
  • Volume
    15
  • Issue
    5
  • fYear
    2003
  • Firstpage
    1316
  • Lastpage
    1337
  • Abstract
    We introduce the ClusterTree, a new indexing approach for representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related groups within the clusters. A ClusterTree is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. Our cluster representation is highly adaptive to any kind of cluster. It is well accepted that most existing indexing techniques degrade rapidly as the dimensions increase. The ClusterTree provides a practical solution to index clustered data sets and supports the retrieval of the nearest-neighbors effectively without having to linearly scan the high-dimensional data set. We also discuss an approach to dynamically reconstruct the ClusterTree when new data is added. We present the detailed analysis of this approach and justify it extensively with experiments.
  • Keywords
    database indexing; pattern clustering; tree data structures; very large databases; ClusterTree; cluster representation; clustered data sets; index structure; indexing approach; large data sets; nearest-neighbor search; retrieval; subclusters; Clustering algorithms; Data mining; Degradation; Feature extraction; Image reconstruction; Indexing; Information retrieval; Large-scale systems; Multidimensional systems; Nearest neighbor searches;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2003.1232281
  • Filename
    1232281