• DocumentCode
    2210241
  • Title

    Efficient Semi-supervised Spectral Co-clustering with Constraints

  • Author

    Shi, Xiaoxiao ; Fan, Wei ; Yu, Philip S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    1043
  • Lastpage
    1048
  • Abstract
    Co-clustering was proposed to simultaneously cluster objects and features to explore inter-correlated patterns. For example, by analyzing the blog click-through data, one finds the group of users who are interested in a specific group of blogs in order to perform applications such as recommendations. However, it is usually very difficult to achieve good co-clustering quality by just analyzing the object-feature correlation data due to the sparsity of the data and the noise. Meanwhile, one may have some prior knowledge that indicates the internal structure of the co-clusters. For instance, one may find user cluster information from the social network system, and the blog-blog similarity from the social tags or contents. This prior information provides some supervision toward the co-cluster structures, and may help reduce the effect of sparsity and noise. However, most co-clustering algorithms do not use this information and may produce unmeaningful results. In this paper we study the problem of finding the optimal co-clusters when some objects and features are believed to be in the same cluster a priori. A matrix decomposition based approach is proposed to formulate as a trace minimization problem, and solve it efficiently with the selected eigenvectors. The asymptotic complexity of the proposed approach is the same as co-clustering without constraints. Experiments include graph-pattern co-clustering and document-word co-clustering. For instance, in graph-pattern data set, the proposed model can improve the normalized mutual information by as much as 5.5 times and 10 times faster than two naive solutions that expand the edges and vertices in the graphs.
  • Keywords
    constraint handling; correlation methods; eigenvalues and eigenfunctions; graph theory; matrix decomposition; pattern clustering; social networking (online); asymptotic complexity; data sparsity; document-word co-clustering; graph-pattern co-clustering; matrix decomposition based approach; normalized mutual information; object feature analysis; semisupervised spectral coclustering algorithm; social network system; trace minimization problem; Co-clustering; Semi-supervised Learning; Spectral;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.64
  • Filename
    5694082