DocumentCode
1290334
Title
Prototype extraction and adaptive OCR
Author
Xu, Yihong ; Nagy, George
Author_Institution
Hewlett-Packard Co., Palo Alto, CA, USA
Volume
21
Issue
12
fYear
1999
fDate
12/1/1999 12:00:00 AM
Firstpage
1280
Lastpage
1296
Abstract
To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print
Keywords
document image processing; image segmentation; optical character recognition; probability; OCR accuracy; adaptive OCR; character locations; character widths; digitization; document-specific OCR systems; heavily degraded print; high recognition accuracy; match probabilities; nonmatch probabilities; page image composition; prototype extraction; segmentation-free word recognition; training samples; transcripts; unsegmented text images; Algorithm design and analysis; Character recognition; Degradation; Image recognition; Image segmentation; Optical character recognition software; Production systems; Prototypes; Text analysis; Typesetting;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/34.817408
Filename
817408
Link To Document