Title of article
On spaced seeds for similarity search Original Research Article
Author/Authors
Uri Keich، نويسنده , , Ming Li، نويسنده , , Hong-bin Ma، نويسنده , , John Tromp، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2004
Pages
11
From page
253
To page
263
Abstract
Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity (smaller k finds weaker similarities). A novel idea of using a single deterministic optimized spaced seed was introduced in Ma et al. (Bioinformatics (2002) 18) to the above similarity search process and it was empirically demonstrated that the optimal spaced seed quadruples the search speed, without sacrificing sensitivity. Multiple, randomly spaced patterns, spaced q-grams, and spaced probes were also studied in Califano and Rigoutsos (Technical Report, IBM T.J. Watson Research Center (1995), Burkhardt, Kärkkäinen, CPM (2001), and Buhler, Bioinformatics 17 (2001) 419) and in other applications [(RECOMB (1999) 295, RECOMB (2000) 245)]. They were all found to be better than their contiguous counterparts. In this paper we study some of the theoretical and practical aspects of optimal seeds. In particular we demonstrate that the commonly used contiguous seed is in some sense the worst one, and we offer an algorithmic solution to the problem of finding the optimal seed.
Keywords
BLAST , Gapped seeds , Seeded alignment , Similarity search
Journal title
Discrete Applied Mathematics
Serial Year
2004
Journal title
Discrete Applied Mathematics
Record number
885846
Link To Document