• DocumentCode
    3166749
  • Title

    Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

  • Author

    Sen Su ; Xiang Cheng ; Lixin Gao ; Jiangtao Yin

  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    1193
  • Lastpage
    1198
  • Abstract
    Co-clustering is a powerful data mining tool for co-occurrence and dyadic data. As data sets become increasingly large, the scalability of co-clustering becomes more and more important. In this paper, we propose two approaches to parallelize co-clustering with sequential updates in a distributed environment. Based on these two approaches, we present a new distributed framework, Co-ClusterD, that supports efficient implementations of co-clustering algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two co-clustering algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Our evaluation shows that co-clustering algorithms implemented in Co-ClusterD can achieve better results and run faster than their traditional concurrent counterparts.
  • Keywords
    data mining; distributed processing; information theory; matrix decomposition; pattern clustering; Amazon EC2 cloud; Co-ClusterD; FNMTF; ITCC; coclustering parallelization approach; data coclustering algorithm; distributed environment; distributed framework; dyadic data; fast nonnegative matrix trifactorization; information theoretic coclustering; sequential updates; Algorithm design and analysis; Clustering algorithms; Convergence; Distributed databases; Linear programming; Scalability; Synchronization; Cloud Computing; Co-Clustering; Concurrent Updates; Distributed Framework; Sequential Updates;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2013.76
  • Filename
    6729620