• DocumentCode
    2027957
  • Title

    An efficient distributed hierarchical-clustering algorithm for large scale data

  • Author

    Tang, Cheng-Hsien ; Huang, An-Ching ; Tsai, Meng-Feng ; Wang, Wei-Jen

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Engeering, Nat. Central Univ., Jhongli, Taiwan
  • fYear
    2010
  • fDate
    16-18 Dec. 2010
  • Firstpage
    869
  • Lastpage
    874
  • Abstract
    The data-classification process can possibly involve a huge amount of data in today´s cloud computing environment. It could take a long time for processing, and could consume many resources for computation and storage. This study focuses on the problem of using the traditional hierarchical agglomerative clustering algorithm on a distributed environment since hierarchical agglomerative clustering has high applicability and efficiency. A parallel hierarchical ag-glomerative clustering algorithm is proposed in this study. The proposed algorithm divides the whole computation into several small tasks, distribute the tasks to message-passing processes, and merge the results to form a hierarchical cluster. A threshold is used to reduce the storage requirement during the computation. To evaluate the performance and limitation of our algorithm, this study has conducted several experiments using real astronomical data, the main asteroid belt catalog. The experimental results confirm that the proposed parallel algorithm is efficient.
  • Keywords
    astronomy computing; cloud computing; message passing; parallel algorithms; pattern classification; pattern clustering; storage management; cloud computing environment; data classification process; large scale data; message passing; parallel algorithm; parallel hierarchical agglomerative clustering algorithm; real astronomical data; storage requirement; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Complexity theory; Partitioning algorithms; Program processors; Symmetric matrices; Hierarchical Clustering; Parallel Computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Symposium (ICS), 2010 International
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-7639-8
  • Type

    conf

  • DOI
    10.1109/COMPSYM.2010.5685388
  • Filename
    5685388