• DocumentCode
    1992795
  • Title

    An efficient data structure for applying multiple seeds in homology search

  • Author

    Khodabakhshi, Alireza Hadj ; Mirzazadeh, Mehdi ; Gupta, Arvind

  • Author_Institution
    Simon Fraser Univ., Burnaby
  • fYear
    2007
  • fDate
    14-17 Oct. 2007
  • Firstpage
    1374
  • Lastpage
    1378
  • Abstract
    Homology search is a fundamental tool in genomics and proteomics research and considerable attention has been paid to the development of faster homology finding algorithms. The most efficient homology search algorithms known are based on finding short exact or near exact subsequences and then extending these to form long matches. Traditional tools such as FASTA and BLAST use the consecutive seed model, that is, an exact short match is found and then extended. More recently, PatternHunter, which uses the spaced seeds model, has shown considerably better sensitivity and attention has been focused on the design of optimal seeds for specific data sets. In some cases, the use of multiple seeds has been shown to be necessary. Here we focus on the problem of applying multiple seeds to a dataset in an efficient manner. In particular, when multiple seeds are applied, each seed can result in the same subsequence being found necessitating the removal of duplicates. We propose an efficient data structure that, given a set of seeds and two input sequences, reports the set of matching subsequences (that is, reports matching subsequences without duplications). We show that for any number of seeds, our algorithm is asymptotically optimal when the input sequences are longer than a certain threshold.
  • Keywords
    biology computing; cellular biophysics; data structures; genetics; molecular biophysics; molecular configurations; proteins; data structure; genomics; homology search; matching subsequences; multiple seeds; proteomics; Bioinformatics; Data structures; Databases; Dynamic programming; Genetic mutations; Genomics; In vitro; Pattern matching; Proteomics; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
  • Conference_Location
    Boston, MA
  • Print_ISBN
    978-1-4244-1509-0
  • Type

    conf

  • DOI
    10.1109/BIBE.2007.4375750
  • Filename
    4375750