Abstract :
The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive online, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in O(log log n + log log |Σ|) worst-case expected time per character, where n is the size of the string, and Σ is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation. The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list L of size n under insertions, deletions, and order queries. In an order query, one is given two nodes from L and must determine which node precedes the other in L. In an extension to this problem, named the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP for short), it is also necessary to maintain dynamic subsets S1, · · · , Sk ⊆ L, such that given some u ∈ L it will be possible to quickly locate the predecessor of u in Si, for any integer 1 ≤ i ≤ k. This paper provides an efficient data structure capable of locating the predecessor of u in Si in O(log log n) worst-case time and answering order queries on L in O(1) worst-case time, while allowing updates to L in O(1) worst-case expected time and updates to the subsets in O(log log n) worst-case expected time. This improves over a previous data structure which may be implicitly obtaine- from Dietz [8], in which the updates to the sets and L are done in O(log log n) amortized expected time. In addition, the bounds shown here match the currently best known bounds for predecessor search in the RAM model. Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays, the monotonic list labeling problem, and the Order-Maintenance Problem.
Keywords :
formal languages; indexing; query processing; text analysis; trees (mathematics); POLP; RAM model; algorithmic problem; deletions; fully-persistent arrays; general alphabets; insertions; monotonic list labeling problem; online indexing; online suffix tree construction; order maintenance problem; order query answering; ordered list subset; pattern queries; predecessor queries; predecessor search on dynamic subsets of an ordered dynamic list problem; text indexing; Arrays; Heuristic algorithms; Indexing; Maintenance engineering; Random access memory; Silicon; data structures; order-maintenance; pattern matching; predecessor; suffix tree;