Title :
On the finite sample performance of the nearest neighbor classifier
Author :
Psaltis, Demetri ; Snapp, Robert R. ; Venkatesh, Santosh S.
Author_Institution :
Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA
fDate :
5/1/1994 12:00:00 AM
Abstract :
The finite sample performance of a nearest neighbor classifier is analyzed for a two-class pattern recognition problem. An exact integral expression is derived for the m-sample risk Rm given that a reference m-sample of labeled points is available to the classifier. The statistical setup assumes that the pattern classes arise in nature with fixed a priori probabilities and that points representing the classes are drawn from Euclidean n-space according to fixed class-conditional probability distributions. The sample is assumed to consist of m independently generated class-labeled points. For a family of smooth class-conditional distributions characterized by asymptotic expansions in general form, it is shown that the m-sample risk Rm has a complete asymptotic series expansion Rm~R∞+Σk=2∞ ckm-kn/ (m→∞), where R∞ denotes the nearest neighbor risk in the infinite-sample limit and the coefficients ck are distribution-dependent constants independent of the sample size m. The analysis thus provides further analytic validation of Bellman´s curse of dimensionality. Numerical simulations corroborating the formal results are included, and extensions of the theory discussed. The analysis also contains a novel application of Laplace´s asymptotic method of integration to a multidimensional integral where the integrand attains its maximum on a continuum of points
Keywords :
convergence of numerical methods; integration; pattern recognition; series (mathematics); Bellman´s curse of dimensionality; Euclidean n-space; Laplace´s asymptotic method of integration; asymptotic expansions; class-labeled points; distribution-dependent constants; exact integral expression; finite sample performance; fixed class-conditional probability distributions; infinite-sample limit; integrand; m-sample risk; multidimensional integral; nearest neighbor classifier; probabilities; series expansion; smooth class-conditional distributions; statistical setup; two-class pattern recognition problem; Convergence; Information theory; Multidimensional systems; Nearest neighbor searches; Numerical simulation; Pattern analysis; Pattern recognition; Performance analysis; Probability distribution; Signal processing;
Journal_Title :
Information Theory, IEEE Transactions on