DocumentCode
1992795
Title
An efficient data structure for applying multiple seeds in homology search
Author
Khodabakhshi, Alireza Hadj ; Mirzazadeh, Mehdi ; Gupta, Arvind
Author_Institution
Simon Fraser Univ., Burnaby
fYear
2007
fDate
14-17 Oct. 2007
Firstpage
1374
Lastpage
1378
Abstract
Homology search is a fundamental tool in genomics and proteomics research and considerable attention has been paid to the development of faster homology finding algorithms. The most efficient homology search algorithms known are based on finding short exact or near exact subsequences and then extending these to form long matches. Traditional tools such as FASTA and BLAST use the consecutive seed model, that is, an exact short match is found and then extended. More recently, PatternHunter, which uses the spaced seeds model, has shown considerably better sensitivity and attention has been focused on the design of optimal seeds for specific data sets. In some cases, the use of multiple seeds has been shown to be necessary. Here we focus on the problem of applying multiple seeds to a dataset in an efficient manner. In particular, when multiple seeds are applied, each seed can result in the same subsequence being found necessitating the removal of duplicates. We propose an efficient data structure that, given a set of seeds and two input sequences, reports the set of matching subsequences (that is, reports matching subsequences without duplications). We show that for any number of seeds, our algorithm is asymptotically optimal when the input sequences are longer than a certain threshold.
Keywords
biology computing; cellular biophysics; data structures; genetics; molecular biophysics; molecular configurations; proteins; data structure; genomics; homology search; matching subsequences; multiple seeds; proteomics; Bioinformatics; Data structures; Databases; Dynamic programming; Genetic mutations; Genomics; In vitro; Pattern matching; Proteomics; Robustness;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location
Boston, MA
Print_ISBN
978-1-4244-1509-0
Type
conf
DOI
10.1109/BIBE.2007.4375750
Filename
4375750
Link To Document