Title of article :
On spaced seeds for similarity search Original Research Article
Author/Authors :
Uri Keich، نويسنده , , Ming Li، نويسنده , , Hong-bin Ma، نويسنده , , John Tromp، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2004
Pages :
11
From page :
253
To page :
263
Abstract :
Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity (smaller k finds weaker similarities). A novel idea of using a single deterministic optimized spaced seed was introduced in Ma et al. (Bioinformatics (2002) 18) to the above similarity search process and it was empirically demonstrated that the optimal spaced seed quadruples the search speed, without sacrificing sensitivity. Multiple, randomly spaced patterns, spaced q-grams, and spaced probes were also studied in Califano and Rigoutsos (Technical Report, IBM T.J. Watson Research Center (1995), Burkhardt, Kärkkäinen, CPM (2001), and Buhler, Bioinformatics 17 (2001) 419) and in other applications [(RECOMB (1999) 295, RECOMB (2000) 245)]. They were all found to be better than their contiguous counterparts. In this paper we study some of the theoretical and practical aspects of optimal seeds. In particular we demonstrate that the commonly used contiguous seed is in some sense the worst one, and we offer an algorithmic solution to the problem of finding the optimal seed.
Keywords :
BLAST , Gapped seeds , Seeded alignment , Similarity search
Journal title :
Discrete Applied Mathematics
Serial Year :
2004
Journal title :
Discrete Applied Mathematics
Record number :
885846
Link To Document :
بازگشت