Title :
Improving operon prediction in E. coli
Author :
Dam, Phuongan ; Olman, Victor ; Xu, Ying
Author_Institution :
Dept. of Biochem. & Molecular Biol., Georgia Univ., Athens, GA, USA
Abstract :
In bacterium, genes working in the same pathway or interacting with each other are often organized into operons. Currently, the prediction accuracy for operon/boundary gene pairs is fairly good in Escherichia coli, however, such a high level of success in recognizing a gene pair as a boundary or operon pair does not automatically transcribe into a high level of accuracy in predicting the boundary of operons. We found that for several operon prediction programs, the prediction accuracy is often less accurate when the intergenic region of a gene pair is between 40 to 250 base pairs. In our approach, multiple features of the intergenic region, gene length and available microarray data in E. coli were used to improve the accuracy of the operon prediction programs in general and of gene pairs in the above intergenic region in particular. These features were scored according to a log likelihood formula, and the result suggests that we can gain up to 8% increase in the accuracy level for gene pairs with the intergenic distance between 40-250 base pairs. For other regions, the newly added features also give a moderate improvement in prediction accuracy. Furthermore, the accuracy in predicting transcript boundary is also improved, comparing to methods using the intergenic distance and functional annotation alone. We are currently fine-tuning our program to predict all operons in E. coli, and applying this method to predict operons in other organisms.
Keywords :
biology computing; genetics; maximum likelihood estimation; microorganisms; E. coli bacterium; boundary gene pair; functional annotation; intergenic distance; intergenic region; log likelihood formula; microarray data; operon prediction program; transcript boundary prediction; Accuracy; Biochemistry; Bioinformatics; Biology computing; Computational biology; Gene expression; Genomics; Laboratories; Organisms; Phylogeny;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
Print_ISBN :
0-7695-2442-7
DOI :
10.1109/CSBW.2005.76