DocumentCode :
3123168
Title :
Classification with high-dimensional sparse samples
Author :
Huang, Dayu ; Meyn, Sean
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2012
fDate :
1-6 July 2012
Firstpage :
2586
Lastpage :
2590
Abstract :
The task of the binary classification problem is to determine which of two distributions has generated a length-n test sequence. The two distributions are unknown; two training sequences of length N, one from each distribution, are observed. The distributions share an alphabet of size m, which is significantly larger than n and N. How does N,n,m affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which N,n,m all tend to infinity, under the assumption that probability of any symbol is O(m-1). The results are: 1) There exists an asymptotically consistent classifier if and only if m = o(min{N2, Nn}). This extends the previous consistency result in [1] to the case N ≠ n. 2) For the sparse sample case where max{n, N} = o(m), finer results are obtained: The best achievable probability of error decays as - log(Pe) = J min {N2, Nn}(1 +o(1))/m with J >; 0. 3) A weighted coincidence-based classifier has non-zero generalized error exponent J. 4) The ℓ2-norm based classifier has J = 0.
Keywords :
error statistics; signal classification; signal sampling; ℓ2-norm based classifier; asymptotically consistent classifier; binary classification; classification error; error decays; generalized error exponent; high dimensional sparse samples; length-n test sequence; training sequences; Approximation methods; Information theory; Random variables; Testing; Training; Vectors; Vocabulary; classification; generalized error exponent; high-dimensional model; large deviations; sparse sample;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
ISSN :
2157-8095
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
Type :
conf
DOI :
10.1109/ISIT.2012.6283985
Filename :
6283985
Link To Document :
بازگشت