مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

1290334

Title :

Prototype extraction and adaptive OCR

Author :

Xu, Yihong ; Nagy, George

Author_Institution :

Hewlett-Packard Co., Palo Alto, CA, USA

Volume :

Issue :

fYear :

1999

fDate :

12/1/1999 12:00:00 AM

Firstpage :

1280

Lastpage :

1296

Abstract :

To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print

Keywords :

document image processing; image segmentation; optical character recognition; probability; OCR accuracy; adaptive OCR; character locations; character widths; digitization; document-specific OCR systems; heavily degraded print; high recognition accuracy; match probabilities; nonmatch probabilities; page image composition; prototype extraction; segmentation-free word recognition; training samples; transcripts; unsegmented text images; Algorithm design and analysis; Character recognition; Degradation; Image recognition; Image segmentation; Optical character recognition software; Production systems; Prototypes; Text analysis; Typesetting;

fLanguage :

English

Journal_Title :

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Publisher :

ieee

ISSN :

0162-8828

Type :

jour

DOI :

10.1109/34.817408

Filename :

817408

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1290334