• DocumentCode
    640223
  • Title

    Space-efficient representation of truncated suffix trees, with applications to Markov order estimation

  • Author

    Vitale, Luciana ; Martin, Andrew ; Seroussi, Gadiel

  • Author_Institution
    Facilitaci de Ing., Univ. de la Republica, Montevideo, Uruguay
  • fYear
    2013
  • fDate
    7-12 July 2013
  • Firstpage
    1924
  • Lastpage
    1928
  • Abstract
    Suffix trees (ST) are useful in information-theoretic applications such as model order estimation and lossless source coding, which require access to occurrence counts of patterns of arbitrary length in an input string x. If the length of x, n, is large, the memory required to represent the ST may become a practical performance bottleneck. This can be alleviated, in cases where a nontrivial upper bound is known on the lengths of the patterns of interest, by using a truncated ST (TST). However, conventional TST implementations still require Ω(n) bits of memory, due to the need to store x. We describe a new TST representation that avoids this limitation by storing all the information necessary to reconstruct the TST edge labels in a string y that is often much shorter than x. We apply TSTs to the implementation of Markov order estimators, where an upper bound kn on the estimated order is either imposed (for consistency, as in KT-based MDL estimators), or can be derived (as in the BIC estimator). The new representation allows for estimator implementations with sublinear space complexity in some cases of interest. In other cases we show, experimentally, that even when the new representation does not have an asymptotic advantage, it still achieves very significant memory savings in practice.
  • Keywords
    Bayes methods; Markov processes; computational complexity; estimation theory; trees (mathematics); BIC estimator; Bayesian information criterion; KT-based MDL estimators; Markov order estimation; Markov order estimators; TST edge labels; TST representation; arbitrary length; information-theoretic applications; lossless source coding; memory savings; model order estimation; nontrivial upper bound; occurrence counts; space-efficient representation; string; sublinear space complexity; truncated ST; truncated suffix trees; Complexity theory; Estimation; Information theory; Markov processes; Memory management; Nickel; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on
  • Conference_Location
    Istanbul
  • ISSN
    2157-8095
  • Type

    conf

  • DOI
    10.1109/ISIT.2013.6620561
  • Filename
    6620561