DocumentCode :
985315
Title :
Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets
Author :
Kim, Sang-Woon ; Oommen, B. John
Author_Institution :
Div. of Comput. Sci. & Eng., Myongji Univ., Yongin, South Korea
Volume :
34
Issue :
3
fYear :
2004
fDate :
6/1/2004 12:00:00 AM
Firstpage :
1384
Lastpage :
1397
Abstract :
Most of the prototype reduction schemes (PRS), which have been reported in the literature, process the data in its entirety to yield a subset of prototypes that are useful in nearest-neighbor-like classification. Foremost among these are the prototypes for nearest neighbor classifiers, the vector quantization technique, and the support vector machines. These methods suffer from a major disadvantage, namely, that of the excessive computational burden encountered by processing all the data. In this paper, we suggest a recursive and computationally superior mechanism referred to as adaptive recursive partitioning (ARP)_PRS. Rather than process all the data using a PRS, we propose that the data be recursively subdivided into smaller subsets. This recursive subdivision can be arbitrary, and need not utilize any underlying clustering philosophy. The advantage of ARP_PRS is that the PRS processes subsets of data points that effectively sample the entire space to yield smaller subsets of prototypes. These prototypes are then, in turn, gathered and processed by the PRS to yield more refined prototypes. In this manner, prototypes which are in the interior of the Voronoi spaces, and thus ineffective in the classification, are eliminated at the subsequent invocations of the PRS. We are unaware of any PRS that employs such a recursive philosophy. Although we marginally forfeit accuracy in return for computational efficiency, our experimental results demonstrate that the proposed recursive mechanism yields classification comparable to the best reported prototype condensation schemes reported to-date. Indeed, this is true for both artificial data sets and for samples involving real-life data sets. The results especially demonstrate that a fair computational advantage can be obtained by using such a recursive strategy for " large" data sets, such as those involved in data mining and text categorization applications.
Keywords :
computational geometry; pattern recognition; recursive functions; statistical analysis; very large databases; Voronoi space; adaptive recursive partitioning; artificial data set; data mining; nearest neighbor classifier; prototype condensation scheme; real-life data set; recursive prototype reduction; support vector machine; text categorization; vector quantization; Computational efficiency; Computer science; Data mining; Nearest neighbor searches; Neural networks; Pattern recognition; Prototypes; Support vector machine classification; Support vector machines; Vector quantization; Algorithms; Artificial Intelligence; Computing Methodologies; Database Management Systems; Databases, Factual; Feedback; Income; Information Storage and Retrieval; Pattern Recognition, Automated; Reproducibility of Results; Sample Size; Sensitivity and Specificity;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/TSMCB.2004.824524
Filename :
1298888
Link To Document :
بازگشت