• DocumentCode
    2341487
  • Title

    High similarity sequence comparison in clustering large sequence databases

  • Author

    Dudoignon, Lorie ; Glemet, Eric ; Heus, Hendrik Cornelis ; Raffinot, Mathieu

  • Author_Institution
    IMT, INRIA, Marseille, France
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    228
  • Lastpage
    236
  • Abstract
    We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.
  • Keywords
    biology computing; cellular biophysics; computational complexity; genetics; molecular biophysics; pattern clustering; scientific information systems; sequences; very large databases; chromosomes; complexity; fast algorithm; high similarity sequence comparison; large sequence database clustering; proteic sequences; sequence searching; similarity measure; subwords; Bioinformatics; Chromium; Computer Society; Databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
  • Print_ISBN
    0-7695-1653-X
  • Type

    conf

  • DOI
    10.1109/CSB.2002.1039345
  • Filename
    1039345