Title :
Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model
Author :
Zhang, Xiao-Fei ; Dai, Dao-Qing ; Li, Xiao-Xin
Author_Institution :
Center for Comput. Vision & Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Abstract :
Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification.
Keywords :
biochemistry; biology computing; exponential distribution; genomics; molecular biophysics; physiological models; proteins; Laplacian regularizer; competing algorithms; computational algorithms; density-based methods; detecting protein complexes; exponential distribution; generation processing; generative network model; peripheral proteins; postgenome era; protein complexes discovery; protein complexes identification; protein-protein interaction data; protein-protein interaction networks; regularized sparse generative network model; traditional algorithms; Biological system modeling; Communities; Exponential distribution; Polymers; Proteins; RNA; Protein complex; generative network model; overlapping complex; peripheral protein.; protein-protein interaction network; regularization method; Algorithms; Databases, Protein; Models, Theoretical; Protein Interaction Mapping; Proteins;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2012.20