مرکز منطقه ای اطلاع رساني علوم و فناوري - An OCR based on character shape codes and lexical information

DocumentCode :

3313984

Title :

An OCR based on character shape codes and lexical information

Author :

Spitz, A. Lawrence

Author_Institution :

Fuji Xerox Palo Alto Lab., CA, USA

Volume :

fYear :

1995

fDate :

14-16 Aug 1995

Firstpage :

723

Abstract :

We describe an OCR process which has as its principal attributes high speed of operation and tunability to the lexical content of the documents to which it is applied. This process relies on the transformation of the text image into character shape codes, a rapid and robust process, and on special lexica which contain information on the “shape” of words and the character ambiguities present within particular word shape classifications. We rely on the structure of English (in the current case) and the high percentage of singleton mappings between the shape codes and the characters in the words. Considerable ambiguity is removed by simple lookup in the specially tuned and structured lexicon and substitution on a character-by-character basis. Ambiguity is further reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font, face and size as well as image quality

Keywords :

image matching; optical character recognition; OCR; character shape codes; lexical information; lexicon; singleton mappings; template matching; text image; word shape classifications; Character recognition; High speed optical techniques; Image quality; Image reconstruction; Image resolution; Information retrieval; Laboratories; Optical character recognition software; Robustness; Shape;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on

Conference_Location :

Montreal, Que.

Print_ISBN :

0-8186-7128-9

Type :

conf

DOI :

10.1109/ICDAR.1995.602005

Filename :

602005

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3313984