DocumentCode :
1814575
Title :
Learning yeast gene functions from heterogeneous sources of data using hybrid weighted Bayesian networks
Author :
Deng, Xutao ; Geng, Huimin ; Ali, Hesham
Author_Institution :
Dept of Comput. Sci., Nebraska Univ., Omaha, NE, USA
fYear :
2005
fDate :
8-11 Aug. 2005
Firstpage :
25
Lastpage :
34
Abstract :
We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.
Keywords :
belief networks; biology computing; cellular biophysics; diseases; genetics; learning (artificial intelligence); microorganisms; pattern clustering; Open Reading Frames; Saccharomyces cerevisiae; Weighted Naive Bayesian Network; biological mechanism; budding yeast; cell cycles; clustering outputs; computational method; data collection; data sets; diseases; gene annotation; gene expression; gene functions; keyword annotation; machine learning system; public databases; regulatory pathways; sequence homology; Accuracy; Bayesian methods; Biological processes; Biology computing; Cells (biology); Databases; Diseases; Fungi; Guidelines; Learning systems; Bayesian network; gene function prediction; machine learning; yeast;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Proceedings. 2005 IEEE
Print_ISBN :
0-7695-2344-7
Type :
conf
DOI :
10.1109/CSB.2005.38
Filename :
1498003
Link To Document :
بازگشت