• DocumentCode
    2504031
  • Title

    OCR Post-processing Using Weighted Finite-State Transducers

  • Author

    Llobet, Rafael ; Cerdan-Navarro, J.-R. ; Perez-Cortes, Juan-Carlos ; Arlandis, Joaquim

  • Author_Institution
    Inst. Tecnol. de Inf., Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    2021
  • Lastpage
    2024
  • Abstract
    A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs) is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power of an integrated model.
  • Keywords
    finite state machines; optical character recognition; probability; OCR hypotheses; OCR post-processing; optical character recognition; posteriori class probability; stochastic error-correcting language modeling; weighted finite-state transducers; Biological system modeling; Computational modeling; Optical character recognition software; Probabilistic logic; Stochastic processes; Transducers; Viterbi algorithm; Language Modeling; OCR post-processing; Weighted Finite-State Automatas;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.498
  • Filename
    5597254