DocumentCode :
1143805
Title :
Handwritten character classification using nearest neighbor in large databases
Author :
Smith, Stephen J. ; Bourgoin, Mario O. ; Sims, Karl ; Voorhees, Harry L.
Author_Institution :
Thinking Machines Corp., Cambridge, MA, USA
Volume :
16
Issue :
9
fYear :
1994
fDate :
9/1/1994 12:00:00 AM
Firstpage :
915
Lastpage :
919
Abstract :
Shows that systems built on a simple statistical technique and a large training database can be automatically optimized to produce classification accuracies of 99% in the domain of handwritten digits. It is also shown that the performance of these systems scale consistently with the size of the training database, where the error rate is cut by more than half for every tenfold increase in the size of the training set from 10 to 100,000 examples. Three distance metrics for the standard nearest neighbor classification system are investigated: a simple Hamming distance metric, a pixel distance metric, and a metric based on the extraction of penstroke features. Systems employing these metrics were trained and tested on a standard, publicly available, database of nearly 225,000 digits provided by the National Institute of Standards and Technology. Additionally, a confidence metric is both introduced by the authors and also discovered and optimized by the system. The new confidence measure proves to be superior to the commonly used nearest neighbor distance
Keywords :
computer vision; learning (artificial intelligence); optical character recognition; Hamming distance metric; National Institute of Standards and Technology; classification accuracies; confidence metric; distance metrics; error rate; handwritten character classification; handwritten digits; large training database; nearest neighbor distance; penstroke features extraction; pixel distance metric; simple statistical technique; standard nearest neighbor classification system; Artificial intelligence; Character recognition; Databases; Error analysis; Machine learning; NIST; Nearest neighbor searches; Neural networks; Optical character recognition software; System testing;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/34.310689
Filename :
310689
Link To Document :
بازگشت