DocumentCode
2014534
Title
A Weighted Finite-State Framework for Correcting Errors in Natural Scene OCR
Author
Beaufort, Richard ; Mancas-Thillou, Céline
Author_Institution
Multitel Res. Center, Mons
Volume
2
fYear
2007
fDate
23-26 Sept. 2007
Firstpage
889
Lastpage
893
Abstract
With the increasing market of cheap cameras, natural scene text has to be handled in an efficient way. Some works deal with text detection in the image while more recent ones point out the challenge of text extraction and recognition. We propose here an OCR correction system to handle traditional issues of recognizer errors but also the ones due to natural scene images, i.e. cut characters, artistic display, incomplete sentences (present in advertisements) and out- of-vocabulary (OOV) words such as acronyms and so on. The main algorithm bases on finite-state machines (FSMs) to deal with learned OCR confusions, capital/accented letters and lexicon look-up. Moreover, as OCR is not considered as a black box, several outputs are taken into account to intermingle recognition and correction steps. Based on a public database of natural scene words, detailed results are also presented along with future works.
Keywords
error correction; finite state machines; natural scenes; optical character recognition; text analysis; OCR correction system; capital/accented letters; error correction; finite-state machines; lexicon look-up; natural scene OCR; natural scene image; natural scene text; Cameras; Character recognition; Degradation; Displays; Error correction; Hidden Markov models; Image recognition; Layout; Optical character recognition software; Transducers;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location
Parana
ISSN
1520-5363
Print_ISBN
978-0-7695-2822-9
Type
conf
DOI
10.1109/ICDAR.2007.4377043
Filename
4377043
Link To Document