DocumentCode :
2183498
Title :
Comparative evaluation of techniques for word recognition improvement by incorporation of syntactic information
Author :
Malbur, Michael
Volume :
2
fYear :
1997
fDate :
18-20 Aug 1997
Firstpage :
784
Abstract :
Character recognition results are typically post-processed by dictionary look-up methods. Still, the quality of resulting word hypotheses remains lousy. This paper describes and compares three known methods for word-level post-processing of OCRed documents which all are based on purely statistical means of syntactic language modelling. The three methods compared and tested are described and especially their application to word syntax is related. The implementations have been tested on about 90 printed business letters of different quality. Training of the methods has been undertaken on newspaper texts with some 34 million running words. Although the test set and training set cover different fields of language, the results are quite encouraging and show the methods to be useful in general
Keywords :
business data processing; computational linguistics; document image processing; glossaries; optical character recognition; statistical analysis; OCR; character recognition; dictionary look-up methods; document analysis; newspaper texts; printed business letters; quality; statistical; syntactic information; syntactic language modelling; test set; training; training set; word hypotheses; word recognition improvement; word syntax; word-level post-processing; Business; Character recognition; Context modeling; Dictionaries; Information analysis; Optical character recognition software; Performance analysis; System testing; Text analysis; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
Type :
conf
DOI :
10.1109/ICDAR.1997.620617
Filename :
620617
Link To Document :
بازگشت