Title :
Efficient detection of abnormalities in large OCR databases
Author_Institution :
Inst. of Comput. Sci. & Appl. Math., Berne Univ., Switzerland
Abstract :
Building large optical character recognition (OCR) databases is time-consuming and tedious. Moreover, the process is error-prone due to the difficulty in segmentation and the uncertainty in labelling. When the database is very large, say one million patterns, human errors due to fatigue and inattention become a critical factor. This paper discusses one method to alleviate the burden caused by these problems. Specifically, the method allows an automatic detection of abnormalities, e.g. mislabelling, and thus may contribute to clean up a labelled database. The method is based on the optimum class-selective rejection rule. As a test case, the method is applied to the NIST databases containing nearly 300,000 handwritten numerals
Keywords :
data integrity; document image processing; errors; handwriting recognition; image segmentation; optical character recognition; very large databases; visual databases; NIST databases; database abnormality detection; error-prone; fatigue; handwriting style; handwritten numerals; human errors; inattention; labelled database; labelling; large OCR databases; optical character recognition databases; optimum class-selective rejection rule; segmentation; time-consuming; uncertainty; Character recognition; Databases; Error analysis; Error correction; Labeling; NIST; Optical character recognition software; Pattern recognition; Probability; Testing;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.620661