DocumentCode
2210241
Title
Efficient Semi-supervised Spectral Co-clustering with Constraints
Author
Shi, Xiaoxiao ; Fan, Wei ; Yu, Philip S.
Author_Institution
Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
fYear
2010
fDate
13-17 Dec. 2010
Firstpage
1043
Lastpage
1048
Abstract
Co-clustering was proposed to simultaneously cluster objects and features to explore inter-correlated patterns. For example, by analyzing the blog click-through data, one finds the group of users who are interested in a specific group of blogs in order to perform applications such as recommendations. However, it is usually very difficult to achieve good co-clustering quality by just analyzing the object-feature correlation data due to the sparsity of the data and the noise. Meanwhile, one may have some prior knowledge that indicates the internal structure of the co-clusters. For instance, one may find user cluster information from the social network system, and the blog-blog similarity from the social tags or contents. This prior information provides some supervision toward the co-cluster structures, and may help reduce the effect of sparsity and noise. However, most co-clustering algorithms do not use this information and may produce unmeaningful results. In this paper we study the problem of finding the optimal co-clusters when some objects and features are believed to be in the same cluster a priori. A matrix decomposition based approach is proposed to formulate as a trace minimization problem, and solve it efficiently with the selected eigenvectors. The asymptotic complexity of the proposed approach is the same as co-clustering without constraints. Experiments include graph-pattern co-clustering and document-word co-clustering. For instance, in graph-pattern data set, the proposed model can improve the normalized mutual information by as much as 5.5 times and 10 times faster than two naive solutions that expand the edges and vertices in the graphs.
Keywords
constraint handling; correlation methods; eigenvalues and eigenfunctions; graph theory; matrix decomposition; pattern clustering; social networking (online); asymptotic complexity; data sparsity; document-word co-clustering; graph-pattern co-clustering; matrix decomposition based approach; normalized mutual information; object feature analysis; semisupervised spectral coclustering algorithm; social network system; trace minimization problem; Co-clustering; Semi-supervised Learning; Spectral;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location
Sydney, NSW
ISSN
1550-4786
Print_ISBN
978-1-4244-9131-5
Electronic_ISBN
1550-4786
Type
conf
DOI
10.1109/ICDM.2010.64
Filename
5694082
Link To Document