DocumentCode :
1386923
Title :
A decision-theoretic extension of stochastic complexity and its applications to learning
Author :
Yamanishi, Kenji
Author_Institution :
NEC Res. Inst., Princeton, NJ, USA
Volume :
44
Issue :
4
fYear :
1998
fDate :
7/1/1998 12:00:00 AM
Firstpage :
1424
Lastpage :
1439
Abstract :
Rissanen (1978) has introduced stochastic complexity to define the amount of information in a given data sequence relative to a given hypothesis class of probability densities, where the information is measured in terms of the logarithmic loss associated with universal data compression. This paper introduces the notion of extended stochastic complexity (ESC) and demonstrates its effectiveness in design and analysis of learning algorithms in on-line prediction and batch-learning scenarios. ESC can be thought of as an extension of Rissanen´s stochastic complexity to the decision-theoretic setting where a general real-valued function is used as a hypothesis and a general loss function is used as a distortion measure. As an application of ESC to on-line prediction, this paper shows that a sequential realization of ESC produces an on-line prediction algorithm called Vovk´s aggregating strategy, which can be thought of as an extension of the Bayes algorithm. We derive upper bounds on the cumulative loss for the aggregating strategy both of an expected form and a worst case form in the case where the hypothesis class is continuous. As an application of ESC to batch-learning, this paper shows that a batch-approximation of ESC induces a batch-learning algorithm called the minimum L-complexity algorithm (MLC), which is an extension of the minimum description length (MDL) principle. We derive upper bounds on the statistical risk for the MLC, which are the least to date. Through the ESC we give a unifying view of the most effective learning algorithms that have been explored in computational learning theory
Keywords :
computational complexity; data compression; decision theory; learning systems; prediction theory; stochastic processes; MDL; Vovk´s aggregating strategy; batch-learning algorithm; computational learning theory; data sequence; decision-theoretic; distortion measure; extended stochastic complexity; general loss function; hypothesis class; information content; learning algorithms; logarithmic loss; minimum L-complexity algorithm; minimum description length; on-line prediction algorithm; probability densities; real-valued function; statistical risk; universal data compression; upper bounds; Algorithm design and analysis; Data compression; Density measurement; Distortion measurement; Loss measurement; National electric code; Prediction algorithms; Probability; Stochastic processes; Upper bound;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/18.681319
Filename :
681319
Link To Document :
بازگشت