Title :
Exploration of Short Reads Genome Mapping in Hardware
Author :
Fernandez, Edward ; Najjar, Walid ; Harris, Elena ; Lonardi, Stefano
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, Riverside, CA, USA
fDate :
Aug. 31 2010-Sept. 2 2010
Abstract :
The newest generation of sequencing instruments, such as Illumina/Solexa Genome Analyzer and ABI SOLiD, can generate hundreds of millions of short DNA “reads” from a single run. These reads must be matched against a reference genome to identify their original location. Due to sequencing errors or variations in the sequenced genome, the matching procedure must allow a variable but limited number of mismatches. This problem is a version of the classic approximate string matching where a long text is searched for the occurrence of a set of short patterns. Typical strategies to speed up the matching involve elaborate hashing schemes that exploit the inherent repetitions of the data. However, such large data structures are not well suited for FPGA implementations. In this paper we evaluate an FPGA implementation that uses a “naive” approach which checks every possible read-genome alignment. We compare the performance of the naive approach to popular software tools currently used to map short reads to a reference genome showing a speedup of up to 4X over the fastest software tool.
Keywords :
bioinformatics; cellular biophysics; field programmable gate arrays; genomics; logic design; parallel architectures; string matching; ABI SOLiD; FPGA implementation; Illumina-Solexa genome analyzer; genome mapping; hashing schemes; large data structures; read-genome alignment; sequenced genome; sequencing errors; software tools; string matching; Reconfigurable computing; bioinformatics; component; string-matching;
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2010 International Conference on
Conference_Location :
Milano
Print_ISBN :
978-1-4244-7842-2
DOI :
10.1109/FPL.2010.78