• DocumentCode
    2742358
  • Title

    Compressed index for dynamic text

  • Author

    Hon, Wing-Kai ; Lam, Tak-Wah ; Sadakane, Kunihiko ; Sung, Wing-Kin ; Yiu, Siu-Ming

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Univ., China
  • fYear
    2004
  • fDate
    23-25 March 2004
  • Firstpage
    102
  • Lastpage
    111
  • Abstract
    This paper investigates how to index a text which is subject to updates. The best solution in the literature (P.Ferragina, et al., 1998) is based on suffix tree using O(n log n) bits of storage, where n is the length of the text. It supports finding all occurrences of a pattern P in O(|P|+occ) time, where occ is the number of occurrences. Each text update consists of inserting or deleting a substring of length y and can be supported in O(y+√(n)) time. In this paper, we initiate the study of compressed index using only O(n log |Σ|) bits of space, where Σ denotes the alphabet. Our solution supports finding all occurrences of a pattern P in O(|P| Iog2n(logεn+log|Σ|)+occlog1+εn) time, while insertion or deletion of a substring of length y can be done in O((y+√(n)) Iog2+ε n) amortized tune, where 0<ε≤1. The core part of our data structure is based on the recent work on compressed suffix trees (CST) and compressed suffix arrays (CSA).
  • Keywords
    data compression; indexing; text analysis; tree data structures; compressed index; compressed suffix arrays; compressed suffix trees; data structure; dynamic text; Computer science; DNA; Data compression; Data engineering; Entropy; Error correction; Indexing; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2004. Proceedings. DCC 2004
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2082-0
  • Type

    conf

  • DOI
    10.1109/DCC.2004.1281455
  • Filename
    1281455