• DocumentCode
    3049486
  • Title

    Faster approximate string matching over compressed text

  • Author

    Navarro, Gonzalo ; Kida, Takuya ; Takeda, Masayuki ; Shinohara, Ayumi ; Arikawa, Setsuo

  • Author_Institution
    Dept. of Comput. Sci., Chile Univ., Santiago, Chile
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    459
  • Lastpage
    468
  • Abstract
    Approximate string matching on compressed text was an open problem for almost a decade. The two existing solutions are very new. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. We present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem
  • Keywords
    approximation theory; data compression; search problems; string matching; text analysis; LZ78/LZW compression formats; Ziv-Lempel compressed text; approximate string matching; bit parallel technique; complexity; compressed text; local decompression; multipattern searching; searching; uncompressed text; Audio databases; Biomedical signal processing; Computational biology; Computer errors; Computer science; DNA; Pattern matching; Proteins; Signal processing algorithms; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2001. Proceedings. DCC 2001.
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-1031-0
  • Type

    conf

  • DOI
    10.1109/DCC.2001.917177
  • Filename
    917177