DocumentCode
2341487
Title
High similarity sequence comparison in clustering large sequence databases
Author
Dudoignon, Lorie ; Glemet, Eric ; Heus, Hendrik Cornelis ; Raffinot, Mathieu
Author_Institution
IMT, INRIA, Marseille, France
fYear
2002
fDate
2002
Firstpage
228
Lastpage
236
Abstract
We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.
Keywords
biology computing; cellular biophysics; computational complexity; genetics; molecular biophysics; pattern clustering; scientific information systems; sequences; very large databases; chromosomes; complexity; fast algorithm; high similarity sequence comparison; large sequence database clustering; proteic sequences; sequence searching; similarity measure; subwords; Bioinformatics; Chromium; Computer Society; Databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN
0-7695-1653-X
Type
conf
DOI
10.1109/CSB.2002.1039345
Filename
1039345
Link To Document