DocumentCode :
2195132
Title :
Efficient detection of abnormalities in large OCR databases
Author :
HA, Thien M.
Author_Institution :
Inst. of Comput. Sci. & Appl. Math., Berne Univ., Switzerland
Volume :
2
fYear :
1997
fDate :
18-20 Aug 1997
Firstpage :
1006
Abstract :
Building large optical character recognition (OCR) databases is time-consuming and tedious. Moreover, the process is error-prone due to the difficulty in segmentation and the uncertainty in labelling. When the database is very large, say one million patterns, human errors due to fatigue and inattention become a critical factor. This paper discusses one method to alleviate the burden caused by these problems. Specifically, the method allows an automatic detection of abnormalities, e.g. mislabelling, and thus may contribute to clean up a labelled database. The method is based on the optimum class-selective rejection rule. As a test case, the method is applied to the NIST databases containing nearly 300,000 handwritten numerals
Keywords :
data integrity; document image processing; errors; handwriting recognition; image segmentation; optical character recognition; very large databases; visual databases; NIST databases; database abnormality detection; error-prone; fatigue; handwriting style; handwritten numerals; human errors; inattention; labelled database; labelling; large OCR databases; optical character recognition databases; optimum class-selective rejection rule; segmentation; time-consuming; uncertainty; Character recognition; Databases; Error analysis; Error correction; Labeling; NIST; Optical character recognition software; Pattern recognition; Probability; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
Type :
conf
DOI :
10.1109/ICDAR.1997.620661
Filename :
620661
Link To Document :
بازگشت