DocumentCode
2428746
Title
Three improvements to the BLASTP search of genome databases
Author
Delaney, Shawn ; Butler, Greg ; Lam, Clement ; Thiel, Larry
Author_Institution
Dept. of Comput. Sci., Concordia Univ., Montreal, Que., Canada
fYear
2000
fDate
2000
Firstpage
14
Lastpage
24
Abstract
The BLASTP program is a search tool for databases of protein sequences that is widely used by biologists as a first step in investigating new genome sequences. BLASTP finds high-scoring local alignments (qiqi+1…qi+k||s jsj+1…sj+k) without gaps between a query sequence q and sequences s in the database. The score of an alignment is the sum of the scores of individual alignments qi+t ||sj+t between amino acids that make up the protein. These individual scores come from a scoring matrix modeling the rate of evolutionary mutation. Here we provide a detailed description of the original program and three separate optimisations to it. BLASTP consists of three steps, that we call neighbourhood construction, hit detection, and hit extension. The three optimisations target hit extension since it accounts for 93% of the execution time. The first optimisation alters the data representation of the query sequence and the related code for indexing the scoring matrix. The second optimisation performs extensions in step-sizes of two rather than one. The third optimisation forstalls the calling of the hit extension step in cases that are unlikely to lead to a high-scoring alignment. Individually the three optimisations show speed ups of 15%, 48%, and 63% respectively
Keywords
data structures; database indexing; medical information systems; BLASTP search; amino acids; data representation; evolutionary mutation; genome databases; genome sequences; hit detection; hit extension; indexing; local alignments; protein sequences database; query sequence; scoring matrix; search tool; Amino acids; Bioinformatics; Biological information theory; Computer science; Databases; Genetic mutations; Genomics; Indexing; Proteins; Uninterruptible power systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Scientific and Statistical Database Management, 2000. Proceedings. 12th International Conference on
Conference_Location
Berlin
ISSN
1099-3371
Print_ISBN
0-7695-0686-0
Type
conf
DOI
10.1109/SSDM.2000.869775
Filename
869775
Link To Document