مرکز منطقه ای اطلاع رساني علوم و فناوري - A source coding approach to classification by vector quantization and the principle of minimum description length

DocumentCode :

2477831

Title :

A source coding approach to classification by vector quantization and the principle of minimum description length

Author :

Li, Jia

Author_Institution :

Dept. of Stat., Pennsylvania State Univ., University Park, PA, USA

fYear :

2002

fDate :

2002

Firstpage :

382

Lastpage :

391

Abstract :

An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X_i, Y_i)}_i=1ⁿ, which are independent samples from a joint distribution P_XY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P_XY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P_XY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART^R on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

Keywords :

Bayes methods; Huffman codes; entropy codes; optimisation; pattern classification; sampling methods; source coding; vector quantisation; Bayes classification rule; DVQ; Huffman code; MDL; conditional distribution; density estimation; discriminant vector quantization; encoder optimization; entropy coding; independent samples; joint distribution; minimum description length; misclassification rate; quantization error; regression; source coding; statistical model; supervised classification; training data; two-stage vector quantizer; Clustering algorithms; Data compression; Probability; Prototypes; Random variables; Source coding; Statistics; Testing; Training data; Vector quantization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference, 2002. Proceedings. DCC 2002

ISSN :

1068-0314

Print_ISBN :

0-7695-1477-4

Type :

conf

DOI :

10.1109/DCC.2002.999978

Filename :

999978

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2477831