Title :
Three improvements to the BLASTP search of genome databases
Author :
Delaney, Shawn ; Butler, Greg ; Lam, Clement ; Thiel, Larry
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, Que., Canada
Abstract :
The BLASTP program is a search tool for databases of protein sequences that is widely used by biologists as a first step in investigating new genome sequences. BLASTP finds high-scoring local alignments (qiqi+1…qi+k||s jsj+1…sj+k) without gaps between a query sequence q and sequences s in the database. The score of an alignment is the sum of the scores of individual alignments qi+t ||sj+t between amino acids that make up the protein. These individual scores come from a scoring matrix modeling the rate of evolutionary mutation. Here we provide a detailed description of the original program and three separate optimisations to it. BLASTP consists of three steps, that we call neighbourhood construction, hit detection, and hit extension. The three optimisations target hit extension since it accounts for 93% of the execution time. The first optimisation alters the data representation of the query sequence and the related code for indexing the scoring matrix. The second optimisation performs extensions in step-sizes of two rather than one. The third optimisation forstalls the calling of the hit extension step in cases that are unlikely to lead to a high-scoring alignment. Individually the three optimisations show speed ups of 15%, 48%, and 63% respectively
Keywords :
data structures; database indexing; medical information systems; BLASTP search; amino acids; data representation; evolutionary mutation; genome databases; genome sequences; hit detection; hit extension; indexing; local alignments; protein sequences database; query sequence; scoring matrix; search tool; Amino acids; Bioinformatics; Biological information theory; Computer science; Databases; Genetic mutations; Genomics; Indexing; Proteins; Uninterruptible power systems;
Conference_Titel :
Scientific and Statistical Database Management, 2000. Proceedings. 12th International Conference on
Conference_Location :
Berlin
Print_ISBN :
0-7695-0686-0
DOI :
10.1109/SSDM.2000.869775