مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparative evaluation of techniques for word recognition improvement by incorporation of syntactic information

DocumentCode :

2183498

Title :

Comparative evaluation of techniques for word recognition improvement by incorporation of syntactic information

Author :

Malbur, Michael

Volume :

fYear :

1997

fDate :

18-20 Aug 1997

Firstpage :

784

Abstract :

Character recognition results are typically post-processed by dictionary look-up methods. Still, the quality of resulting word hypotheses remains lousy. This paper describes and compares three known methods for word-level post-processing of OCRed documents which all are based on purely statistical means of syntactic language modelling. The three methods compared and tested are described and especially their application to word syntax is related. The implementations have been tested on about 90 printed business letters of different quality. Training of the methods has been undertaken on newspaper texts with some 34 million running words. Although the test set and training set cover different fields of language, the results are quite encouraging and show the methods to be useful in general

Keywords :

business data processing; computational linguistics; document image processing; glossaries; optical character recognition; statistical analysis; OCR; character recognition; dictionary look-up methods; document analysis; newspaper texts; printed business letters; quality; statistical; syntactic information; syntactic language modelling; test set; training; training set; word hypotheses; word recognition improvement; word syntax; word-level post-processing; Business; Character recognition; Context modeling; Dictionaries; Information analysis; Optical character recognition software; Performance analysis; System testing; Text analysis; Voting;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on

Conference_Location :

Ulm

Print_ISBN :

0-8186-7898-4

Type :

conf

DOI :

10.1109/ICDAR.1997.620617

Filename :

620617

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2183498