On Strong Consistency of Model Selection in Classification

Author

Suzuki, Joe

Author_Institution

Dept. of Math., Osaka Univ.

Volume

52

Issue

11

fYear

2006

Firstpage

4767

Lastpage

4774

Abstract

This paper considers model selection in classification. In many applications such as pattern recognition, probabilistic inference using a Bayesian network, prediction of the next in a sequence based on a Markov chain, the conditional probability P(Y=y|X=x) of class yisinY given attribute value xisinX is utilized. By model we mean the equivalence relation in X: for x,x´isinXx~x´hArrP(Y=y|X=x)=P(Y=y|X=x´), forall yisinY. By classification we mean the number of such equivalence classes is finite. We estimate the model from n samples zⁿ=(x_i,y_i)_i=1 ⁿisin(XtimesY)ⁿ, using information criteria in the form empirical entropy H plus penalty term (k/2)d_n (the model such that H+(k/2)d_n is minimized is the estimated model), where k is the number of independent parameters in the model, and {d_n}_n=1 ^infin is a real nonnegative sequence such that lim sup_nd_n/n=0. For autoregressive processes, although the definitions of H and k are different, it is known that the estimated model almost surely coincides with the true model as nrarrinfin if {d_n}_n=1 ^infin>{2loglogn}_n=1 ^infin, and that it does not if {d_n}_n=1 ^infin<{2loglogn}_n=1 ^infin (Hannan and Quinn). The problem whether the same property is true for classification was open. This paper solves the problem in the affirmative

Keywords

Markov processes; autoregressive processes; belief networks; entropy; pattern classification; probability; sequences; Bayesian network; Markov chain; autoregressive process; classification; conditional probability; empirical entropy; information criteria; model selection; sequence; strong consistency; Artificial intelligence; Autoregressive processes; Bayesian methods; Entropy; Intelligent networks; Mathematics; Pattern recognition; Random variables; Statistical learning; Statistics; Error probability; Hannan and Quinn´s procedure; Kullback–Leibler divergence; law of the iterated logarithm; model selection; strong consistency;

fLanguage

English

Journal_Title

Information Theory, IEEE Transactions on

Publisher

ieee

ISSN

0018-9448

Type

jour

DOI

10.1109/TIT.2006.883611

Filename

1715524