• DocumentCode
    2870619
  • Title

    Improved degraded document recognition with hybrid modeling techniques and character n-grams

  • Author

    Brakensiek, Anja ; Willett, Daniel ; Rigoll, Gerhard

  • Author_Institution
    Dept. of Comput. Sci., Gerhard-Mercator-Univ. Duisburg, Germany
  • Volume
    4
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    438
  • Abstract
    A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described. The system is based on hidden Markov models using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database
  • Keywords
    database management systems; document image processing; feature extraction; hidden Markov models; information theory; neural nets; optical character recognition; SEDAL-database; degraded document recognition; feature extraction; hidden Markov models; information theory; multifont character recognition; neural network; Character recognition; Computer science; Databases; Degradation; Error analysis; Hidden Markov models; Image recognition; Optical character recognition software; Robustness; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2000. Proceedings. 15th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-0750-6
  • Type

    conf

  • DOI
    10.1109/ICPR.2000.902952
  • Filename
    902952