DocumentCode
557543
Title
Gene prediction in metagenomic fragments based on the SVM algorithm
Author
Liu, Yongchu ; Guo, Jiangtao ; Zhu, Huaiqiu
Author_Institution
Dept. of Biomed. Eng., Peking Univ., Beijing, China
Volume
3
fYear
2011
fDate
15-17 Oct. 2011
Firstpage
1738
Lastpage
1742
Abstract
Metagenomic sequencing is becoming a powerful method to explore various environmental organisms without isolation and cultivation. Genomic sequences data generated by this technology is growing explosively while numerous computational methods for analysis are still urgently in need. One of the first and most important processes is exhaustive gene prediction. As short and anonymous DNA fragments, assembly of metagenomic sequences usually has not a fixed end point to obtain complete genomes and moreover is often not available. This situation makes the annotation more complicated than in complete genomes. Here, we present a newly developed SVM-based algorithm which comprises a supervised universal model and a data-specific novel model. It utilizes entropy density profiles of codon usage, translation initiation signal scoring and open read frame length for model training. Tests on fixed-length artificial shotgun sequences of 700 bp showed a sensitivity of 94.7% and a specificity of 94.9% on average, which indicate that our method has the totally higher performance than the best of current gene prediction methods. Thousands of additional genes are predicted when applied to two metagenomic samples from human gut community. Furthermore, compared to other gene predictors, our algorithm predicts the most potential novel genes.
Keywords
DNA; biology computing; genomics; molecular biophysics; support vector machines; DNA fragments; SVM algorithm; codon usage; data-specific novel model; entropy density profile; fixed-length artificial shotgun sequence; gene prediction method; genomic sequence data; human gut community; metagenomic fragment; metagenomic sample; metagenomic sequence; model training; open read frame length; supervised universal model; translation initiation signal scoring; Bioinformatics; DNA; Databases; Genomics; Microorganisms; Proteins; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Biomedical Engineering and Informatics (BMEI), 2011 4th International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4244-9351-7
Type
conf
DOI
10.1109/BMEI.2011.6098588
Filename
6098588
Link To Document