مرکز منطقه ای اطلاع رساني علوم و فناوري - Approximate string matching in DNA sequences

DocumentCode :

3363523

Title :

Approximate string matching in DNA sequences

Author :

Cheng, Lok-Lam ; Cheung, David W. ; Yiu, Siu-Ming

Author_Institution :

Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China

fYear :

2003

fDate :

26-28 March 2003

Firstpage :

303

Lastpage :

310

Abstract :

Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffix array if the data structure can be stored entirely in the memory. However our study find that suffix array is much better than suffix tree for indexing the DNA sequences since the data structure has to be created and stored on the disk due to its size. We propose a novel auxiliary data structure which greatly improves the efficiency of suffix array in the approximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose 2 novel parallel algorithms for this problem and implement them on a PC cluster The result shows that when the error allowed is small, a direct partitioning of the array over the machines in the cluster is a more efficient approach. On the other hand, when the error allowed is large, partitioning the data over the machines is a better approach.

Keywords :

DNA; biology computing; parallel algorithms; string matching; PC cluster; approximate string matching; auxiliary data structure; bioinformatics; efficient data structure; large DNA sequences; parallel algorithms; suffix array; suffix tree; Bioinformatics; DNA; Data structures; Databases; Genomics; Humans; Indexing; Pattern matching; Sequences; Tree data structures;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on

Conference_Location :

Kyoto, Japan

Print_ISBN :

0-7695-1895-8

Type :

conf

DOI :

10.1109/DASFAA.2003.1192395

Filename :

1192395

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3363523