• DocumentCode
    2950409
  • Title

    Benchmarking of gene prediction programs for metagenomic data

  • Author

    Yok, Non ; Rosen, Gail

  • Author_Institution
    Electr. & Comput. Eng. Dept., Drexel Univ., Philadelphia, PA, USA
  • fYear
    2010
  • fDate
    Aug. 31 2010-Sept. 4 2010
  • Firstpage
    6190
  • Lastpage
    6193
  • Abstract
    This manuscript presents the most rigorous benchmarking of gene annotation algorithms for metagenomic datasets to date. We compare three different programs: GeneMark, MetaGeneAnnotator (MGA) and Orphelia. The comparisons are based on their performances over simulated fragments from one hundred species of diverse lineages. We defined four different types of fragments; two types come from the inter- and intra-coding regions and the other types are from the gene edges. Hoff et al. used only 12 species in their comparison; therefore, their sample is too small to represent an environmental sample. Also, no predecessors has separately examined fragments that contain gene edges as opposed to intra-coding regions. General observations in our results are that performances of all these programs improve as we increase the length of the fragment. On the other hand, intra-coding fragments of our data show low annotation error in all of the programs if compared to the gene edge fragments. Overall, we found an upper-bound performance by combining all the methods.
  • Keywords
    bioinformatics; genetics; genomics; GeneMark; MetaGeneAnnotator; Orphelia; benchmarking; gene annotation algorithms; gene edge fragments; gene prediction programs; intra-coding fragments; intracoding regions; metagenomic data; Benchmark testing; Bioinformatics; Encoding; Genomics; Hidden Markov models; Measurement uncertainty; Sensitivity; Algorithms; Benchmarking; Databases, Genetic; Metagenomics; Molecular Sequence Annotation; ROC Curve;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
  • Conference_Location
    Buenos Aires
  • ISSN
    1557-170X
  • Print_ISBN
    978-1-4244-4123-5
  • Type

    conf

  • DOI
    10.1109/IEMBS.2010.5627744
  • Filename
    5627744