Title :
Mining new protein-protein interactions
Author :
Mamitsuka, Hiroshi
Author_Institution :
Inst. for Chem. Res., Kyoto Univ., Japan
Abstract :
One of the most reliable approaches for determining a function of a given functionally unknown protein is to see if it is interacting with another protein with a known function. This problem of combining protein class information to data on protein-protein interactions to systematically predict unknown protein-protein interactions is solved by a probabilistic model-based approach. In general, a probabilistic model for clustering is a latent-variable model in which a latent variable takes values, each of which corresponds to a cluster. This paper extends such typical latent-variable model to a model having two (lower- and upper-level) latent variables hierarchically. This hierarchical probabilistic structure is a key feature of this model. It should be noted that this hierarchical dependency is reasonable since it cannot be observed which pair of protein classes is useful for predicting protein-protein interactions, although what protein pairs are interacting with each other can be observed. This structure allows to systematically deal with a set of protein classes as a latent variable of observable protein-protein interactions. Results show that the approach presented in this paper for capturing guilt-by-association interactions successfully outperformed existing methods and support vector machines, being statistically significant in predicting new protein-protein interactions. Thus, predicting protein-protein interactions should be conducted by unsupervised learning approach, and then the performance of hierarchical aspect model (HAM) should be compared with those of other unsupervised methods.
Keywords :
biology computing; data mining; molecular biophysics; physiological models; proteins; unsupervised learning; clustering; data mining; guilt-by-association interactions; hierarchical aspect model; hierarchical probabilistic structure; latent-variable model; probabilistic model-based approach; protein-protein interactions; unsupervised learning approach; Biochemistry; Bioinformatics; Data mining; Databases; Fungi; Genomics; Machine learning; Protein engineering; Random variables; Sequences; Algorithms; Computer Simulation; Databases, Protein; Gene Expression Profiling; Information Storage and Retrieval; Models, Biological; Models, Chemical; Models, Statistical; Pattern Recognition, Automated; Protein Interaction Mapping; Proteome;
Journal_Title :
Engineering in Medicine and Biology Magazine, IEEE
DOI :
10.1109/MEMB.2005.1436467