DocumentCode :
3408357
Title :
Compressed pattern matching in DNA sequences
Author :
Chen, Lei ; Lu, Shiyong ; Ram, Jeffrey
Author_Institution :
Wayne State Univ., Detroit, MI, USA
fYear :
2004
fDate :
16-19 Aug. 2004
Firstpage :
62
Lastpage :
68
Abstract :
We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.
Keywords :
DNA; biology computing; molecular biophysics; string matching; Agrep; DNA sequences; compressed pattern matching; derivative Boyer-Moore method; long DNA patterns; string matching algorithms; DNA; Encoding; Genetics; Huffman coding; Organisms; Pattern matching; Search methods; Sequences; Software algorithms; Software packages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
Type :
conf
DOI :
10.1109/CSB.2004.1332418
Filename :
1332418
Link To Document :
بازگشت