Author_Institution :
Helsinki Institute for Information Technology, Technical Universities of Tampere and Helsinki, and Computer Learning Research Center, University of London
Abstract :
Summary form only. Inspired by Kolmogorov´s structure function for finite sets as models of data in the algorithmic theory of information we adapt the construct to families of probability models to avoid the noncomputability problem. The picture of modeling looks then as follows: The models in the family have a double index, where the first specifies a structure, ranging over a finite or a countable set, and the second consists of parameter values, ranging over a continuum. An optimal structure index can be determined by the MDL (Minimum Description Length) principle in a two-part code, where the sum of the code lengths for the structure and the data is minimized. The latter is obtained from the universal NML (Normalized Maximum Likelihood) model for the subfamily of models having a specified structure. The determination of the optimal model in the optimized structure is more difficult. It requires a partition of the parameter space into equivalence classes, each associated with a model, in such a way that the Kullback-Leibler distance between any two adjacent models is equal and that the models are optimally distinguishable from the given amount of data. This notion of distinguishability is a modification of a related idea of Balasubramanian. The particular model, specified by the observed data, is the simplest one that incorporates all the properties in the data that can be extracted with the model class considered.