• DocumentCode
    3491269
  • Title

    Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides

  • Author

    Gori, Fabio ; Mavroedis, Dimitrios ; Jetten, Mike S M ; Marchiori, Elena

  • Author_Institution
    iCIS, Radboud Univ. Nijmegen, Nijmegen, Netherlands
  • fYear
    2011
  • fDate
    2-4 Sept. 2011
  • Firstpage
    149
  • Lastpage
    154
  • Abstract
    Metagenomics studies microbial communities by analyzing their genomic content directly sequenced from the environment. To this aim metagenomic datasets, consisting of many short DNA or RNA fragments, are computationally analyzed using statistical and machine learning methods with the general purpose of binning or taxonomic annotation. Many of these methods act on features derived from the data through a genomic signature, where a typical genomic signature of a fragment is a vector whose entries specify the frequency with which oligonucleotides appear in that fragment. In this article we analyze experimentally the ability of existing genomic signatures to facilitate the discrimination between fragments belonging to different genomes. We also propose new genomic signatures that take into account that fragments can have been sequenced from both strands of a genome; this is achieved by exploiting the reverse complementarity of oligonucleotides. We conduct extensive experiments on in silico sampled genomic fragments in order to assess comparatively the effectiveness of existing genomic signatures and those proposed in this article. Results of the experiments indicate that the direct use of the reverse complementarity of tetranucleotides in the definition of a genome signatures allows to have performances comparable to the best existing signatures using less features. Therefore the proposed genomic signatures provide an alternative set of features for analyzing metagenomic data. Online Supplementary material is available at http://www.cs.ru.nl/~gori/signature metagenomics/.
  • Keywords
    DNA; bioinformatics; biological techniques; genomics; learning (artificial intelligence); microorganisms; molecular biophysics; molecular configurations; statistical analysis; DNA fragments; RNA fragments; binning; fragment discrimination; genomic content; genomic signatures; machine learning methods; metagenomic data analysis; microbial communities; oligonucleotide frequency; statistical methods; taxonomic annotation; tetranucleotide reverse complementarity; vector; Bioinformatics; Conferences; DNA; Data analysis; Genomics; Organisms; Systems biology; genome signature; metagenome binning; metagenomic data analysis; metagenomics; taxonomic annotation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Biology (ISB), 2011 IEEE International Conference on
  • Conference_Location
    Zhuhai
  • Print_ISBN
    978-1-4577-1661-4
  • Electronic_ISBN
    978-1-4577-1665-2
  • Type

    conf

  • DOI
    10.1109/ISB.2011.6033147
  • Filename
    6033147