• DocumentCode
    3147633
  • Title

    Error Correction and Clustering Algorithms for Next Generation Sequencing

  • Author

    Yang, Xiao

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    2101
  • Lastpage
    2104
  • Abstract
    Next generation sequencing (NGS) revolutionized genomic data generation by enabling high-throughput parallel sequencing, making large scale genomic data analysis a crucial task. To improve NGS data quality, we developed an efficient algorithm that uses a flexible read decomposition method to improve accuracy of error correction. We further proposed a statistical framework to differentiate infrequently observed sub reads from sequencing errors in the prevalence of genomic repeats. To enable the analysis of microbial organism composition in environmental samples, we developed a parallel solution for metagenomic sequence clustering integrating sketching, quasi-clique enumeration and MapReduce techniques.
  • Keywords
    bioinformatics; cloud computing; data analysis; error correction; genomics; pattern clustering; sequences; MapReduce techniques; NGS data quality; error correction; flexible read decomposition method; genomic data analysis; genomic data generation; high-throughput parallel sequencing; metagenomic sequence clustering; microbial organism composition; next generation sequencing; quasiclique enumeration; Bioinformatics; Clustering algorithms; Error correction; Genomics; High definition video; Memory management; Tiles;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.387
  • Filename
    6009098