Title :
Comparative evaluation of techniques for word recognition improvement by incorporation of syntactic information
Abstract :
Character recognition results are typically post-processed by dictionary look-up methods. Still, the quality of resulting word hypotheses remains lousy. This paper describes and compares three known methods for word-level post-processing of OCRed documents which all are based on purely statistical means of syntactic language modelling. The three methods compared and tested are described and especially their application to word syntax is related. The implementations have been tested on about 90 printed business letters of different quality. Training of the methods has been undertaken on newspaper texts with some 34 million running words. Although the test set and training set cover different fields of language, the results are quite encouraging and show the methods to be useful in general
Keywords :
business data processing; computational linguistics; document image processing; glossaries; optical character recognition; statistical analysis; OCR; character recognition; dictionary look-up methods; document analysis; newspaper texts; printed business letters; quality; statistical; syntactic information; syntactic language modelling; test set; training; training set; word hypotheses; word recognition improvement; word syntax; word-level post-processing; Business; Character recognition; Context modeling; Dictionaries; Information analysis; Optical character recognition software; Performance analysis; System testing; Text analysis; Voting;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.620617