DocumentCode :
1784738
Title :
Konnector: Connecting paired-end reads using a bloom filter de Bruijn graph
Author :
Vandervalk, Benjamin P. ; Jackman, Shaun D. ; Raymond, Anthony ; Mohamadi, Hamid ; Chen Yang ; Attali, Dean A. ; Chu, James ; Warren, Rene L. ; Birol, Inanc
Author_Institution :
Genome Sci. Centre, BC Cancer Agency, Vancouver, BC, Canada
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
51
Lastpage :
58
Abstract :
Paired-end sequencing yields a read from each end of a DNA fragment, typically leaving a gap of unsequenced nucleotides in the middle. Closing this gap using information from other reads in the same sequencing experiment offers the potential to generate longer “pseudo-reads” using short read sequencing platforms. Such long reads may benefit downstream applications such as de novo sequence assembly, gap filling, and variant detection. With these possible applications in mind, we have developed Konnector, a software tool to fill in the nucleotides of the sequence gap between read pairs by navigating a de Bruijn graph. Konnector represents the de Bruijn graph using a Bloom filter, a probabilistic and memory-efficient data structure. Our implementation is able to store the de Bruijn graph using a mean 1.5 bytes of memory per k-mer, which represents a marked improvement over the typical hash table data structure. The memory usage per k-mer is independent of the k-mer length, enabling application of the tool to large genomes. We report the performance of the tool on simulated and experimental datasets, and discuss its utility for downstream analysis. Availability-Konnector is open-source software, free for academic use, released under the British Columbia Cancer Agency´s academic license. The tool is included with ABySS version 1.5.2 and later, and is available for download from http://www.bcgsc.ca/platform/bioinfo/software/abyss.
Keywords :
DNA; bioinformatics; data structures; genetics; genomics; probability; public domain software; ABySS version 1.5.2; Bloom filter de Bruijn graph; British Columbia Cancer Agency´s academic license; DNA fragment; Konnector; de novo sequence assembly; downstream analysis; downstream applications; experimental datasets; gap filling; genomes; hash table data structure; k-mer length; memory usage; memory-efficient data structure; open-source software; paired-end reads; paired-end sequencing; probabilistic data structure; pseudo-reads; read pairs; sequence gap; sequencing experiment; short read sequencing platforms; simulated datasets; software tool; unsequenced nucleotides; variant detection; Assembly; Bioinformatics; Data structures; Filtering theory; Genomics; Random access memory; Sequential analysis; Bloom filter; de Bruijn graph; de novo genome assembly; paired-end sequencing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999126
Filename :
6999126
Link To Document :
بازگشت