مرکز منطقه ای اطلاع رساني علوم و فناوري - Pruning Nearest Neighbor Competence Preservation Learners

Abstract :

The nearest neighbor classification rule is a memory-based technique, in that its standard learning phase consists in storing the entire set of examples, or training set. During classification, the nearest neighbors of the incoming test object are retrieved in the store and their labels are combined to determine the answer. In order to alleviate both the spatial and temporal cost of this strategy, competence preservation techniques aim at substituting the training set with a selected subset, also known as consistent subset, having the property of correctly classifying all the discarded training set examples. Thus the consistent subset becomes a model of the original training set. Motivated by approaches used in the context of other classification algorithms (such as decision trees) in order to improve generalization and to prevent induction of overly complex models, in this study we investigate the application of the Pessimistic Error Estimate (PEE) principle in the context of the nearest neighbor rule. Specifically, we relax the notion of consistency of a subset and estimate subset generalization as a trade-off between its training set accuracy and its complexity. As major results, we show that a PEE-like selection strategy guarantees to preserve the accuracy of the consistent subset with a far larger reduction factor and, moreover, that sensible generalization improvements can be obtained by using a reduced subset of intermediate size.