DocumentCode
2950409
Title
Benchmarking of gene prediction programs for metagenomic data
Author
Yok, Non ; Rosen, Gail
Author_Institution
Electr. & Comput. Eng. Dept., Drexel Univ., Philadelphia, PA, USA
fYear
2010
fDate
Aug. 31 2010-Sept. 4 2010
Firstpage
6190
Lastpage
6193
Abstract
This manuscript presents the most rigorous benchmarking of gene annotation algorithms for metagenomic datasets to date. We compare three different programs: GeneMark, MetaGeneAnnotator (MGA) and Orphelia. The comparisons are based on their performances over simulated fragments from one hundred species of diverse lineages. We defined four different types of fragments; two types come from the inter- and intra-coding regions and the other types are from the gene edges. Hoff et al. used only 12 species in their comparison; therefore, their sample is too small to represent an environmental sample. Also, no predecessors has separately examined fragments that contain gene edges as opposed to intra-coding regions. General observations in our results are that performances of all these programs improve as we increase the length of the fragment. On the other hand, intra-coding fragments of our data show low annotation error in all of the programs if compared to the gene edge fragments. Overall, we found an upper-bound performance by combining all the methods.
Keywords
bioinformatics; genetics; genomics; GeneMark; MetaGeneAnnotator; Orphelia; benchmarking; gene annotation algorithms; gene edge fragments; gene prediction programs; intra-coding fragments; intracoding regions; metagenomic data; Benchmark testing; Bioinformatics; Encoding; Genomics; Hidden Markov models; Measurement uncertainty; Sensitivity; Algorithms; Benchmarking; Databases, Genetic; Metagenomics; Molecular Sequence Annotation; ROC Curve;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
Conference_Location
Buenos Aires
ISSN
1557-170X
Print_ISBN
978-1-4244-4123-5
Type
conf
DOI
10.1109/IEMBS.2010.5627744
Filename
5627744
Link To Document