• DocumentCode
    3112708
  • Title

    Bounds on estimated Markov orders of individual sequences

  • Author

    Vitale, Luciana ; Martín, Álvaro ; Seroussi, Gadiel

  • Author_Institution
    Inst. de Comput., Univ. de la Republica, Montevideo, Uruguay
  • fYear
    2012
  • fDate
    1-6 July 2012
  • Firstpage
    1102
  • Lastpage
    1106
  • Abstract
    We study the maximal values estimated by commonly used Markov model order estimators on individual sequences. We start with penalized maximum likelihood (PML) estimators with cost functions of the form - log Pk(xn) + f (n)αk, where Pk (xn) is the ML probability of the input sequence xn under a Markov model of order k, a is the size of the input alphabet, and f(n) is an increasing (penalization) function of n (the popular BIC estimator corresponds to f(n) = α - 1/2 log n). Comparison with a memoryless model yields a known upper bound k(n) on the maximum order that xn can estimate. We show that, under mild conditions on f that are satisfied by commonly used penalization functions, this simple bound is not far from tight, in the following sense: for sufficiently large n, and any k<;k̅(n), there are sequences xn that estimate order k; moreover, for all but a vanishing fraction of the values of n such that k = k̅(n), there are sequences xn that estimate order k. We also study KT-based MDL Markov order estimators, and show that in this case, there are sequences xn that estimate order n1/2-ϵ, which is much larger than the maximum log n/log α(l + o(1)) attainable by BIC, or the order o(log n) required for consistency of the KT estimator. In fact, for these sequences, limiting the allowed estimated order might incur in a significant asymptotic penalty in description length. All the results are constructive, and in each case we exhibit explicit sequences that attain the claimed estimated orders.
  • Keywords
    Markov processes; computational complexity; maximum likelihood estimation; BIC estimator; KT-based MDL Markov order estimators; PML estimators; asymptotic penalty; cost functions; description length; explicit sequences; individual sequences; input alphabet; penalized maximum likelihood estimators; Context; Electronic mail; Entropy; Markov processes; Maximum likelihood estimation; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
  • Conference_Location
    Cambridge, MA
  • ISSN
    2157-8095
  • Print_ISBN
    978-1-4673-2580-6
  • Electronic_ISBN
    2157-8095
  • Type

    conf

  • DOI
    10.1109/ISIT.2012.6283023
  • Filename
    6283023