• DocumentCode
    2146776
  • Title

    Hybrid Approach to Adaptive OCR for Historical Books

  • Author

    Kluzner, Vladimir ; Tzadok, Asaf ; Chevion, Dan ; Walach, Eugene

  • Author_Institution
    Document Process. & Manage. Group, IBM Res. - Haifa, Haifa, Israel
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    900
  • Lastpage
    904
  • Abstract
    Optical character recognition (OCR) technology is widely used to convert scanned documents to text. However, historical books still remain a challenge for state-of-the-art OCR engines. This work proposes a new approach to the OCR of large bodies of text by creating an adaptive mechanism that adjusts itself to each text being processed. This approach provides significant improvements to the OCR results achieved. Our approach uses a modified hierarchical optical flow with a second-order regularization term to compare each new character with the set of super-symbols (character templates) by using its distance maps. The classification process is based on a hybrid approach combining measures of geometrical differences (spatial domain) and distortion gradients (feature domain).
  • Keywords
    document image processing; geometry; image classification; image sequences; optical character recognition; text analysis; adaptive OCR; character templates; classification process; distance maps; distortion gradients; feature domain; geometrical differences; hierarchical optical; historical books; optical character recognition; scanned documents; spatial domain; Adaptive optics; Character recognition; Engines; Nonlinear optics; Optical character recognition software; Optical distortion; Optical imaging; adaptive OCR; character classification; distance map; hierarchical optical flow; hybrid classifier; second order regularization term;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.183
  • Filename
    6065441