• DocumentCode
    3663123
  • Title

    Do read errors matter for genome assembly?

  • Author

    Ilan Shomorony;Thomas Courtade;David Tse

  • Author_Institution
    UC Berkeley, USA
  • fYear
    2015
  • fDate
    6/1/2015 12:00:00 AM
  • Firstpage
    919
  • Lastpage
    923
  • Abstract
    While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.
  • Keywords
    "Assembly","Bioinformatics","Genomics","Sequential analysis","Error analysis","DNA","Noise measurement"
  • Publisher
    ieee
  • Conference_Titel
    Information Theory (ISIT), 2015 IEEE International Symposium on
  • Electronic_ISBN
    2157-8117
  • Type

    conf

  • DOI
    10.1109/ISIT.2015.7282589
  • Filename
    7282589