DocumentCode :
573715
Title :
Predicting protein complexes via the integration of multiple biological information
Author :
Tang, Xiwei ; Wang, Jianxin ; Pan, Yi
Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
fYear :
2012
fDate :
18-20 Aug. 2012
Firstpage :
174
Lastpage :
179
Abstract :
Protein complexes are a cornerstone of many biological processes and together they form various types of molecular machinery that perform a vast array of biological functions. An increase in the amount of protein-protein interaction (PPI) data enables a number of computational methods for predicting protein complexes. There are a mass of algorithms detecting complexes only consider the PPI data. However, the PPI data from high-throughout techniques is flooded with false interactions. In fact, the insufficiency of the PPI data significantly lowers the accuracy of these methods. In the current work, we develop a novel method named CMBI to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient (ECC) from the PPI network and the Pearson correlation coefficient (PCC) from the gene expression data. Second, CMBI selects essential proteins as seeds to bnild the protein complex cores. During the growth process, the seeds´ essential protein neighbors and the neighbors whose functional similarity (FS) with the seeds are more than the threshold T will be added to the complex cores. After the complex cores are constructed, CMBI begins to generate protein complexes by attaching their direct neighbors with F S >; T to the cores. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, we compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. We use subsequently GO::TermFinder to analyze the complexes predicted by various methods. Finally, the effect of parameter T is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs signifi- antly better than the state-of-the-art methods. It means that it´s successful for us to integrate multiple biological information to identify protein complexes in the PPI network.
Keywords :
biochemistry; biological techniques; correlation methods; genetics; molecular biophysics; proteins; CMBI method; GO functional enrichment analysis; PPI network; Pearson correlation coefficient; biological functions; biological processing; computational methods; cornerstone; edge-clustering coefficient; false interactions; gene expression data; gene expression profiles; high-throughout techniques; matching analysis; molecular machinery; multiple biological information integration; parameter T effect; predicted complexes; predicting protein complexes; protein complex cores; protein-protein interaction data; reference complexes; state-of-the-art methods; Biological information theory; Clustering algorithms; Irrigation; Proteins; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems Biology (ISB), 2012 IEEE 6th International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4673-4396-1
Electronic_ISBN :
978-1-4673-4397-8
Type :
conf
DOI :
10.1109/ISB.2012.6314132
Filename :
6314132
Link To Document :
بازگشت