DocumentCode :
3165497
Title :
Stochastic Blockmodel with Cluster Overlap, Relevance Selection, and Similarity-Based Smoothing
Author :
Whang, Joyce Jiyoung ; Rai, Piyush ; Dhillon, Inderjit S.
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
817
Lastpage :
826
Abstract :
Stochastic block models provide a rich, probabilistic framework for modeling relational data by expressing the objects being modeled in terms of a latent vector representation. This representation can be a latent indicator vector denoting the cluster membership (hard clustering), a vector of cluster membership probabilities (soft clustering), or more generally a real-valued vector (latent space representation). Recently, a new class of overlapping stochastic block models has been proposed where the idea is to allow the objects to have hard memberships in multiple clusters (in form of a latent binary vector). This aspect captures the properties of many real-world networks in domains such as biology and social networks where objects can simultaneously have memberships in multiple clusters owing to the multiple roles they may have. In this paper, we improve upon this model in three key ways: (1) we extend the overlapping stochastic block model to the bipartite graph case which enables us to simultaneously learn the overlapping clustering of two different sets of objects in the graph, the unipartite graph is just a special case of our model, (2) we allow objects (in either set) to not have membership in any cluster by using a relevant object selection mechanism, and (3) we make use of additionally available object features (or a kernel matrix of pair wise object similarities) to further improve the overlapping clustering performance. We do this by explicitly encouraging similar objects to have similar cluster membership vectors. Moreover, using nonparametric Bayesian prior distributions on the key model parameters, we side-step the model selection issues such as selecting the number of clusters a priori. Our model is quite general and can be applied for both overlapping clustering and link prediction tasks in unipartite and bipartite networks (directed/undirected), or for overlapping co-clustering of general binary-valued data. Experiments on synthetic and real-world d- tasets from biology and social networks demonstrate that our model outperforms several state-of-the-art methods.
Keywords :
Bayes methods; biology computing; graph theory; learning (artificial intelligence); matrix algebra; pattern clustering; social networking (online); stochastic processes; vectors; biology; bipartite graph case; bipartite network; cluster membership probability vector; cluster membership vectors; general binary-valued data overlapping co-clustering; hard clustering; kernel matrix; latent binary vector; latent indicator vector representation; latent space representation; link prediction tasks; nonparametric Bayesian prior distributions; overlapping clustering learning; overlapping stochastic block model; pair wise object similarities; probabilistic framework; real-valued vector; real-world networks; relational data modelling; relevant object selection mechanism; similarity-based smoothing; social networks; soft clustering; unipartite graph; unipartite network; Bayes methods; Bipartite graph; Data models; Kernel; Social network services; Stochastic processes; Vectors; link prediction; nonparametric Bayesian; overlapping clustering; relevance selection; stochastic blockmodel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.156
Filename :
6729566
Link To Document :
بازگشت