• DocumentCode
    3036458
  • Title

    Variable-Order de Bruijn Graphs

  • Author

    Boucher, Christina ; Bowe, Alex ; Gagie, Travis ; Puglisi, Simon J. ; Sadakane, Kunihiko

  • Author_Institution
    Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA
  • fYear
    2015
  • fDate
    7-9 April 2015
  • Firstpage
    383
  • Lastpage
    392
  • Abstract
    The de Bruijn graph GK of a set of strings Sis a key data structure in genome assembly that represents overlaps between all the K-length substrings of S. Construction and navigation of the graph is a space and time bottleneck in practice and the main hurdle for assembling large genomes. This problem is compounded because state-of-the-art assemblers do not build the de Bruijn graph for a single order (value of K) but for multiple values of K: they builddde Bruijn graphs, each with a specific order, i.e., GK1, GK2, GKd. Al-though, this paradigm increases the quality of the assembly produce but it greatly increases runtime, because of the need to construct graphs instead of one. In this paper, we show how to augment a succinct de Bruijn graph representation by Bowe et al. (Proc. WABI, 2012) to support new operations that let us change order on the fly, effectively representing all de Bruijn graphs of order up to some maximum Kin a single data structure. Our experiments show our variable-order de Bruijn graph only modestly increases space usage, construction time, and navigation time compared to a single order graph.
  • Keywords
    bioinformatics; data structures; genomics; graph theory; string matching; bioinformatics; data structure; genome assembly; k-length substring; variable-order de Bruijn graph; Assembly; Bioinformatics; Computer science; DNA; Data structures; Genomics; Navigation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2015
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2015.70
  • Filename
    7149295