Title :
Co-clustering as multilinear decomposition with sparse latent factors
Author :
Papalexakis, Evangelos E. ; Sidiropoulos, Nicholas D.
Author_Institution :
Dept. of ECE, Tech. Univ. Crete, Chania, Greece
Abstract :
The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are ´close´ to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce ´coherent´ groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.
Keywords :
matrix algebra; pattern clustering; set theory; ENRON corpus; K-means clustering problem; PARAFAC decomposition; multilinear decomposition coclustering; sparse latent factors; subset data matrix; Electronic mail; Law; Matrices; Noise; Social network services; Sparse matrices;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5946731