• DocumentCode
    3515584
  • Title

    Scalable parallel co-clustering over multiple heterogeneous data types

  • Author

    Folino, Francesco ; Greco, Gianluigi ; Guzzo, Antonella ; Pontieri, Luigi

  • Author_Institution
    ICAR, CNR, Italy
  • fYear
    2010
  • fDate
    June 28 2010-July 2 2010
  • Firstpage
    529
  • Lastpage
    535
  • Abstract
    The bi-clustering, i.e., simultaneously clustering two types of objects based on their correlations, has been studied actively in the last few years, in virtue of its impact on several relevant applications, such as text mining, collaborative filtering, gene expression analysis. In particular, many research efforts were recently spent on extending such a problem towards higher-order scenarios, where more than two data types are to be clustered synergically, according to pairwise inter-type relations. Measuring co-clustering quality as a weighted combination of the distortions over input relations, a number of alternate-optimization methods were developed of late, which scale linearly with the size of data. This result is likely to be inadequate for large scale applications where massive volumes of data are involved, and high performance solutions would be desirable. However, to date, parallel clustering approaches have been investigated deeply only for the case of just one or two inter-related data types. In this paper, we face the more general (high-order) co-clustering problem by proposing a parallel implementation of an effective and state-of-the-art method, by leveraging a parallel computation infrastructure implementing popular Map-Reduce paradigm.
  • Keywords
    Correlation; Distributed databases; Encoding; Face; Joints; Loss measurement; Probability; Co-Clustering; Data Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Simulation (HPCS), 2010 International Conference on
  • Conference_Location
    Caen, France
  • Print_ISBN
    978-1-4244-6827-0
  • Type

    conf

  • DOI
    10.1109/HPCS.2010.5547087
  • Filename
    5547087