DocumentCode :
2146776
Title :
Hybrid Approach to Adaptive OCR for Historical Books
Author :
Kluzner, Vladimir ; Tzadok, Asaf ; Chevion, Dan ; Walach, Eugene
Author_Institution :
Document Process. & Manage. Group, IBM Res. - Haifa, Haifa, Israel
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
900
Lastpage :
904
Abstract :
Optical character recognition (OCR) technology is widely used to convert scanned documents to text. However, historical books still remain a challenge for state-of-the-art OCR engines. This work proposes a new approach to the OCR of large bodies of text by creating an adaptive mechanism that adjusts itself to each text being processed. This approach provides significant improvements to the OCR results achieved. Our approach uses a modified hierarchical optical flow with a second-order regularization term to compare each new character with the set of super-symbols (character templates) by using its distance maps. The classification process is based on a hybrid approach combining measures of geometrical differences (spatial domain) and distortion gradients (feature domain).
Keywords :
document image processing; geometry; image classification; image sequences; optical character recognition; text analysis; adaptive OCR; character templates; classification process; distance maps; distortion gradients; feature domain; geometrical differences; hierarchical optical; historical books; optical character recognition; scanned documents; spatial domain; Adaptive optics; Character recognition; Engines; Nonlinear optics; Optical character recognition software; Optical distortion; Optical imaging; adaptive OCR; character classification; distance map; hierarchical optical flow; hybrid classifier; second order regularization term;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.183
Filename :
6065441
Link To Document :
بازگشت