DocumentCode :
3520013
Title :
Combining Hierarchical Inference in Ontologies with Heterogeneous Data Sources Improves Gene Function Prediction
Author :
Jiang, Xiaoyu ; Nariai, Naoki ; Steffen, Martin ; Kasif, Simon ; Gold, David ; Kolaczyk, Eric D.
Author_Institution :
Dept. of Math. & Stat., Boston Univ., Boston, MA
fYear :
2008
fDate :
3-5 Nov. 2008
Firstpage :
411
Lastpage :
416
Abstract :
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relational,hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
Keywords :
biology computing; cellular biophysics; database management systems; genetics; molecular biophysics; ontologies (artificial intelligence); proteins; Bayesian framework; STRING; computational biology; gene function prediction; gene interactions; gene ontology database; genomic field; heterogeneous data sources; hierarchical inference; molecular function ontology; ontologies; protein; protein-protein association network; protein-protein interactions; proteomic field; regulatory interactions; search tool; yeast; Bayesian methods; Bioinformatics; Computational biology; Fungi; Genomics; Heart; Information retrieval; Ontologies; Protein engineering; Proteomics; Bayesian framework; STRING; information integration; protein function prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-0-7695-3452-7
Type :
conf
DOI :
10.1109/BIBM.2008.37
Filename :
4684930
Link To Document :
بازگشت