DocumentCode :
3601917
Title :
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification
Author :
Esfahani, Mohammad Shahrokh ; Dougherty, Edward R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
Volume :
12
Issue :
6
fYear :
2015
Firstpage :
1304
Lastpage :
1321
Abstract :
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
Keywords :
biochemistry; cellular biophysics; genomics; molecular biophysics; optimisation; probability; proteins; Dirichlet distribution; discrete phenotype classification; error estimation; gene-protein signaling pathways; genomic data; genomic setting; mammalian cell cycle; mathematical tools; multinomial distribution; negatively impact classifier design; optimal Bayesian classifier; optimization paradigms; optimization-based framework; p53 pathway model; probabilistic structure; training data; Bayes methods; Bioinformatics; Computational biology; Genomics; Proteins; Phenotype classification; biological pathways; optimal Bayesian classifier; prior probability construction; regularized expected mean log-likelihood;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2015.2424407
Filename :
7089209
Link To Document :
بازگشت