DocumentCode
3112708
Title
Bounds on estimated Markov orders of individual sequences
Author
Vitale, Luciana ; Martín, Álvaro ; Seroussi, Gadiel
Author_Institution
Inst. de Comput., Univ. de la Republica, Montevideo, Uruguay
fYear
2012
fDate
1-6 July 2012
Firstpage
1102
Lastpage
1106
Abstract
We study the maximal values estimated by commonly used Markov model order estimators on individual sequences. We start with penalized maximum likelihood (PML) estimators with cost functions of the form - log Pk(xn) + f (n)αk, where Pk (xn) is the ML probability of the input sequence xn under a Markov model of order k, a is the size of the input alphabet, and f(n) is an increasing (penalization) function of n (the popular BIC estimator corresponds to f(n) = α - 1/2 log n). Comparison with a memoryless model yields a known upper bound k(n) on the maximum order that xn can estimate. We show that, under mild conditions on f that are satisfied by commonly used penalization functions, this simple bound is not far from tight, in the following sense: for sufficiently large n, and any k<;k̅(n), there are sequences xn that estimate order k; moreover, for all but a vanishing fraction of the values of n such that k = k̅(n), there are sequences xn that estimate order k. We also study KT-based MDL Markov order estimators, and show that in this case, there are sequences xn that estimate order n1/2-ϵ, which is much larger than the maximum log n/log α(l + o(1)) attainable by BIC, or the order o(log n) required for consistency of the KT estimator. In fact, for these sequences, limiting the allowed estimated order might incur in a significant asymptotic penalty in description length. All the results are constructive, and in each case we exhibit explicit sequences that attain the claimed estimated orders.
Keywords
Markov processes; computational complexity; maximum likelihood estimation; BIC estimator; KT-based MDL Markov order estimators; PML estimators; asymptotic penalty; cost functions; description length; explicit sequences; individual sequences; input alphabet; penalized maximum likelihood estimators; Context; Electronic mail; Entropy; Markov processes; Maximum likelihood estimation; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location
Cambridge, MA
ISSN
2157-8095
Print_ISBN
978-1-4673-2580-6
Electronic_ISBN
2157-8095
Type
conf
DOI
10.1109/ISIT.2012.6283023
Filename
6283023
Link To Document