• DocumentCode
    3225263
  • Title

    Compressed Index for Dictionary Matching

  • Author

    Hon, Wing-Kai ; Shah, Rahul ; Vitter, Jeffrey Scott ; Lam, Tak-Wah ; Siu-Lung Tarn

  • Author_Institution
    Nat. Tsing Hua Univ., Hsinchu
  • fYear
    2008
  • fDate
    25-27 March 2008
  • Firstpage
    23
  • Lastpage
    32
  • Abstract
    The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to |T| Hk (T) + o (|T| log sigma) bits, where Hk(T) denotes the kth-order empirical entropy of T, and sigma is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (n Hk (D) + o(n log sigma))-bit index with only a small sacrifice in search time.
  • Keywords
    computational complexity; data compression; indexing; string matching; compressed index; dictionary matching; pattern matching; sampling technique; string indexing; Bioinformatics; Data compression; Databases; Dictionaries; Entropy; Genomics; Humans; Indexing; Pattern matching; Sampling methods; Compression; Dictionary Matching; Entropy; Indexing; Pattern Matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2008. DCC 2008
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-0-7695-3121-2
  • Type

    conf

  • DOI
    10.1109/DCC.2008.62
  • Filename
    4483280