Abstract :
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discrete-valued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible data-sets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
Keywords :
Markov processes; Monte Carlo methods; computational complexity; encoding; iterative methods; maximum likelihood estimation; minimax techniques; sampling methods; sequences; Markov model; Monte Carlo estimation; Monte Carlo sampling; data sequences; encoding method; iterative method; maximin regret; minimax regret; minimum description length model selection; model complexity term; normalized maximized likelihood formulation; Application software; Computer science; Context modeling; Convergence; Encoding; Maximum likelihood estimation; Minimax techniques; Monte Carlo methods; Sampling methods; Stochastic processes;