• DocumentCode
    2983933
  • Title

    Rapid and Robust Denoising of Pyrosequenced Amplicons for Metagenomics

  • Author

    Byunghan Lee ; Joonhong Park ; Sungroh Yoon

  • Author_Institution
    Electr. Eng. & Comput. Sci., Seoul Nat. Univ., Seoul, South Korea
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    954
  • Lastpage
    959
  • Abstract
    Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-the-art alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.
  • Keywords
    biology computing; data mining; genomics; graphics processing units; molecular biophysics; 16S rRNA clone; GPU; data mining; data processing; gene catalogue; graphics processing unit; high-throughput pyrosequencing technique; metagenomic sequencing; microbial community; noise removal; operational taxonomic unit; pyrosequenced amplicons denoising; sequence read; software tool; time 2 hour; Acceleration; Communities; Graphics processing units; Instruction sets; Noise; Noise reduction; Robustness; GPU; amplicons; biomedical informatics; cluster analysis; metagenomics; pyrosequencing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.68
  • Filename
    6413826