DocumentCode
3225118
Title
Binning DNA fragment of metagenome using a novel model
Author
Hou Tao ; Liu Yun ; Liu Fu ; Wang Ke ; Xie Jian
Author_Institution
Coll. of Commun. Eng., Jilin Univ., Changchun, China
fYear
2015
fDate
23-25 May 2015
Firstpage
4760
Lastpage
4765
Abstract
An essential task addressed in the metagenomics data analysis is to predict the organism of each DNA fragment from a sequenced metagenome, which can aid in linking gene functions to members of the community or estimate the microbial abundance of the studied sample. Some classifiers have been developed to assess the source organism of DNA fragments from metagenome. However, the majority of existing classifiers usually suffer from the lower classification accuracy at genus level. One of the major reasons is they cannot discriminate the training data from different taxonomic classes accurately, when the training data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions and some highly expressed genes, etc. The outliers, treated as noises prohibit the development of classifiers with a better performance. To overcome the difficulty, we presented a strategy based on support vector data description (SVDD) model, which can enhance the discriminating ability of the classifier by giving up some outliers in training genomic data. Experiments were performanced on simulated and real metagenomes. The results demonstrate that our classifier has high classification sensitivity, specificity and accuracy as well as low false negative rate.
Keywords
DNA; biology computing; data analysis; genetics; genomics; microorganisms; pattern classification; support vector machines; DNA fragment; SVDD model; archaeal genomes; bacterial genomes; classification accuracy; classifiers; gene functions; genomic data; genus level; metagenomics data analysis; microbial abundance; organism; sequenced metagenome; support vector data description; taxonomic classes; Accuracy; Bioinformatics; DNA; Genomics; Sensitivity; Support vector machines; Training; Binning; Metagenomics; SVDD; Taxonomic classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Control and Decision Conference (CCDC), 2015 27th Chinese
Conference_Location
Qingdao
Print_ISBN
978-1-4799-7016-2
Type
conf
DOI
10.1109/CCDC.2015.7162767
Filename
7162767
Link To Document