DocumentCode :
778869
Title :
Anatomy of a versatile page reader
Author :
Baird, Henry S.
Author_Institution :
AT&T Bell Lab., Murray Hill, NJ, USA
Volume :
80
Issue :
7
fYear :
1992
fDate :
7/1/1992 12:00:00 AM
Firstpage :
1059
Lastpage :
1065
Abstract :
An experimental printed-page reader that is easy to adapt to various languages is described. Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts, linguistic contexts, and imaging defects. The strategy has been to isolate the effects of these sources of variation within separate, independent engineering subsystems. In this way, it has been possible to construct, with a minimum of manual effort, classifiers for arbitrary combinations of symbols, typefaces, sizes, and imaging defects. An attempt has been made to rid the algorithms of all language-specific rules, relying instead on automatic learning from examples and generalized table-driven methods. For some tasks it has been feasible to avoid language dependency altogether. Linguistic context can be exploited through data-directed filtering algorithms in a uniform and modular manner, so that preexisting tools developed by computational linguistics can readily be applied. These principles are illustrated by trials on English, Swedish, Tibetan, and special technical texts
Keywords :
computational linguistics; document image processing; optical character recognition; OCR; automatic learning; computational linguistics; data-directed filtering algorithms; imaging defects; language-specific rules; linguistic contexts; page layouts; page reader; symbol sets; table-driven methods; typefaces; Anatomy; Character recognition; Dictionaries; Encoding; Filtering algorithms; Optical character recognition software; Optical filters; Optical imaging; Shape control; Space technology;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/5.156469
Filename :
156469
Link To Document :
بازگشت