High similarity sequence comparison in clustering large sequence databases

Author

Dudoignon, Lorie ; Glemet, Eric ; Heus, Hendrik Cornelis ; Raffinot, Mathieu

Author_Institution

IMT, INRIA, Marseille, France

fYear

2002

fDate

2002

Firstpage

228

Lastpage

236

Abstract

We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.

Keywords

biology computing; cellular biophysics; computational complexity; genetics; molecular biophysics; pattern clustering; scientific information systems; sequences; very large databases; chromosomes; complexity; fast algorithm; high similarity sequence comparison; large sequence database clustering; proteic sequences; sequence searching; similarity measure; subwords; Bioinformatics; Chromium; Computer Society; Databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society

Print_ISBN

0-7695-1653-X

Type

conf

DOI

10.1109/CSB.2002.1039345

Filename

1039345

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2341487