Bounds on estimated Markov orders of individual sequences

Author

Vitale, Luciana ; Martín, Álvaro ; Seroussi, Gadiel

Author_Institution

Inst. de Comput., Univ. de la Republica, Montevideo, Uruguay

fYear

2012

fDate

1-6 July 2012

Firstpage

1102

Lastpage

1106

Abstract

We study the maximal values estimated by commonly used Markov model order estimators on individual sequences. We start with penalized maximum likelihood (PML) estimators with cost functions of the form - log P_k(xⁿ) + f (n)α^k, where P_k (xⁿ) is the ML probability of the input sequence xⁿ under a Markov model of order k, a is the size of the input alphabet, and f(n) is an increasing (penalization) function of n (the popular BIC estimator corresponds to f(n) = α - 1/2 log n). Comparison with a memoryless model yields a known upper bound k(n) on the maximum order that xⁿ can estimate. We show that, under mild conditions on f that are satisfied by commonly used penalization functions, this simple bound is not far from tight, in the following sense: for sufficiently large n, and any k<;k̅(n), there are sequences xⁿ that estimate order k; moreover, for all but a vanishing fraction of the values of n such that k = k̅(n), there are sequences xⁿ that estimate order k. We also study KT-based MDL Markov order estimators, and show that in this case, there are sequences xⁿ that estimate order n^1/2-ϵ, which is much larger than the maximum log n/log α(l + o(1)) attainable by BIC, or the order o(log n) required for consistency of the KT estimator. In fact, for these sequences, limiting the allowed estimated order might incur in a significant asymptotic penalty in description length. All the results are constructive, and in each case we exhibit explicit sequences that attain the claimed estimated orders.

Keywords

Markov processes; computational complexity; maximum likelihood estimation; BIC estimator; KT-based MDL Markov order estimators; PML estimators; asymptotic penalty; cost functions; description length; explicit sequences; individual sequences; input alphabet; penalized maximum likelihood estimators; Context; Electronic mail; Entropy; Markov processes; Maximum likelihood estimation; Upper bound;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on

Conference_Location

Cambridge, MA

ISSN

2157-8095

Print_ISBN

978-1-4673-2580-6

Electronic_ISBN

2157-8095

Type

conf

DOI

10.1109/ISIT.2012.6283023

Filename

6283023