DocumentCode :
2016094
Title :
Context-Sensitive Error Correction: Using Topic Models to Improve OCR
Author :
Wick, Michael L. ; Ross, Michael G. ; Learned-Miller, Erik G.
Author_Institution :
Univ. of Massachusetts Amherst, Amherst
Volume :
2
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
1168
Lastpage :
1172
Abstract :
Modern optical, character recognition software relies on human interaction to correct mis recognized characters. Even though the software often reliably identifies low-confidence output, the simple language and vocabulary models employed are insufficient to automatically correct mistakes. This paper demonstrates that topic models, which automatically detect and represent an article´s semantic context, reduces error by 7% over a global word distribution in a simulated OCR correction task. Detecting and leveraging context in this manner is an important step towards improving OCR.
Keywords :
optical character recognition; OCR; context-sensitive error correction; global word distribution; human interaction; optical character recognition software; recognized characters; topic models; Character recognition; Context modeling; Error correction; Frequency; Hidden Markov models; Humans; Linear discriminant analysis; Optical character recognition software; Tongue; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4377099
Filename :
4377099
Link To Document :
بازگشت