DocumentCode
798915
Title
On Strong Consistency of Model Selection in Classification
Author
Suzuki, Joe
Author_Institution
Dept. of Math., Osaka Univ.
Volume
52
Issue
11
fYear
2006
Firstpage
4767
Lastpage
4774
Abstract
This paper considers model selection in classification. In many applications such as pattern recognition, probabilistic inference using a Bayesian network, prediction of the next in a sequence based on a Markov chain, the conditional probability P(Y=y|X=x) of class yisinY given attribute value xisinX is utilized. By model we mean the equivalence relation in X: for x,x´isinXx~x´hArrP(Y=y|X=x)=P(Y=y|X=x´), forall yisinY. By classification we mean the number of such equivalence classes is finite. We estimate the model from n samples zn=(xi,yi)i=1 n isin(XtimesY)n, using information criteria in the form empirical entropy H plus penalty term (k/2)dn (the model such that H+(k/2)dn is minimized is the estimated model), where k is the number of independent parameters in the model, and {dn}n=1 infin is a real nonnegative sequence such that lim supndn/n=0. For autoregressive processes, although the definitions of H and k are different, it is known that the estimated model almost surely coincides with the true model as nrarrinfin if {dn}n=1 infin>{2loglogn}n=1 infin, and that it does not if {dn}n=1 infin<{2loglogn}n=1 infin (Hannan and Quinn). The problem whether the same property is true for classification was open. This paper solves the problem in the affirmative
Keywords
Markov processes; autoregressive processes; belief networks; entropy; pattern classification; probability; sequences; Bayesian network; Markov chain; autoregressive process; classification; conditional probability; empirical entropy; information criteria; model selection; sequence; strong consistency; Artificial intelligence; Autoregressive processes; Bayesian methods; Entropy; Intelligent networks; Mathematics; Pattern recognition; Random variables; Statistical learning; Statistics; Error probability; Hannan and Quinn´s procedure; Kullback–Leibler divergence; law of the iterated logarithm; model selection; strong consistency;
fLanguage
English
Journal_Title
Information Theory, IEEE Transactions on
Publisher
ieee
ISSN
0018-9448
Type
jour
DOI
10.1109/TIT.2006.883611
Filename
1715524
Link To Document