• DocumentCode
    3225118
  • Title

    Binning DNA fragment of metagenome using a novel model

  • Author

    Hou Tao ; Liu Yun ; Liu Fu ; Wang Ke ; Xie Jian

  • Author_Institution
    Coll. of Commun. Eng., Jilin Univ., Changchun, China
  • fYear
    2015
  • fDate
    23-25 May 2015
  • Firstpage
    4760
  • Lastpage
    4765
  • Abstract
    An essential task addressed in the metagenomics data analysis is to predict the organism of each DNA fragment from a sequenced metagenome, which can aid in linking gene functions to members of the community or estimate the microbial abundance of the studied sample. Some classifiers have been developed to assess the source organism of DNA fragments from metagenome. However, the majority of existing classifiers usually suffer from the lower classification accuracy at genus level. One of the major reasons is they cannot discriminate the training data from different taxonomic classes accurately, when the training data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions and some highly expressed genes, etc. The outliers, treated as noises prohibit the development of classifiers with a better performance. To overcome the difficulty, we presented a strategy based on support vector data description (SVDD) model, which can enhance the discriminating ability of the classifier by giving up some outliers in training genomic data. Experiments were performanced on simulated and real metagenomes. The results demonstrate that our classifier has high classification sensitivity, specificity and accuracy as well as low false negative rate.
  • Keywords
    DNA; biology computing; data analysis; genetics; genomics; microorganisms; pattern classification; support vector machines; DNA fragment; SVDD model; archaeal genomes; bacterial genomes; classification accuracy; classifiers; gene functions; genomic data; genus level; metagenomics data analysis; microbial abundance; organism; sequenced metagenome; support vector data description; taxonomic classes; Accuracy; Bioinformatics; DNA; Genomics; Sensitivity; Support vector machines; Training; Binning; Metagenomics; SVDD; Taxonomic classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control and Decision Conference (CCDC), 2015 27th Chinese
  • Conference_Location
    Qingdao
  • Print_ISBN
    978-1-4799-7016-2
  • Type

    conf

  • DOI
    10.1109/CCDC.2015.7162767
  • Filename
    7162767