Title :
Non-parametric co-clustering of large scale sparse bipartite networks on the GPU
Author :
Hansen, Toke Jansen ; Mørup, Morten ; Hansen, Lars Kai
Author_Institution :
Sect. for Cognitive Syst., Tech. Univ. of Denmark, Lyngby, Denmark
Abstract :
Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage memory consumption on the GPU we exploit the structure of the posterior likelihood to obtain a decomposition that easily allows model estimation of the co-clustering problem on arbitrary large networks as well as distributed estimation on multiple GPUs. Finally we evaluate the implementation on real-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant for interpretation of the underlaying processes in such large scale networks.
Keywords :
Internet; computer graphic equipment; coprocessors; inference mechanisms; information filtering; parallel processing; pattern clustering; text analysis; GPU; Web scale text corpora; Web scale text processing; collaborative filtering; infinite relational model; large scale sparse bipartite networks; market basket analysis; nonparametric coclustering; nonparametric generative models; parallel computing; parameter inference; posterior likelihood; Collaboration; Data models; Generators; Graphics processing unit; Instruction sets; Memory management; Motion pictures;
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on
Conference_Location :
Santander
Print_ISBN :
978-1-4577-1621-8
Electronic_ISBN :
1551-2541
DOI :
10.1109/MLSP.2011.6064611