Title :
The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method
Author :
Chou, Chien-Hsing ; Kuo, Bo-Han ; Chang, Fu
Author_Institution :
Inst. of Inf. Sci., Acad. Sinica, Taipei
Abstract :
In this paper, we propose a new data reduction algorithm that iteratively selects some samples and ignores others that can be absorbed, or represented, by those selected. This algorithm differs from the condensed nearest neighbor (CNN) rule in its employment of a strong absorption criterion, in contrast to the weak criterion employed by CNN; hence, it is called the generalized CNN (GCNN) algorithm. The new criterion allows GCNN to incorporate CNN as a special case, and can achieve consistency, or asymptotic Bayes-risk efficiency, under certain conditions. GCNN, moreover, can yield significantly better accuracy than other instance-based data reduction methods. We demonstrate the last claim through experiments on five datasets, some of which contain a very large number of samples
Keywords :
Bayes methods; data reduction; asymptotic Bayes-risk efficiency; data reduction; generalized condensed nearest neighbor algorithm; generalized condensed nearest neighbor rule; Absorption; Cellular neural networks; Clustering algorithms; Employment; Information science; Iterative algorithms; Learning systems; Nearest neighbor searches; Prototypes; Support vector machines;
Conference_Titel :
Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2521-0
DOI :
10.1109/ICPR.2006.1119