DocumentCode
2870619
Title
Improved degraded document recognition with hybrid modeling techniques and character n-grams
Author
Brakensiek, Anja ; Willett, Daniel ; Rigoll, Gerhard
Author_Institution
Dept. of Comput. Sci., Gerhard-Mercator-Univ. Duisburg, Germany
Volume
4
fYear
2000
fDate
2000
Firstpage
438
Abstract
A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described. The system is based on hidden Markov models using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database
Keywords
database management systems; document image processing; feature extraction; hidden Markov models; information theory; neural nets; optical character recognition; SEDAL-database; degraded document recognition; feature extraction; hidden Markov models; information theory; multifont character recognition; neural network; Character recognition; Computer science; Databases; Degradation; Error analysis; Hidden Markov models; Image recognition; Optical character recognition software; Robustness; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2000. Proceedings. 15th International Conference on
Conference_Location
Barcelona
ISSN
1051-4651
Print_ISBN
0-7695-0750-6
Type
conf
DOI
10.1109/ICPR.2000.902952
Filename
902952
Link To Document