Title :
A full English sentence database for off-line handwriting recognition
Author :
Marti, U.-V. ; Bunke, H.
Author_Institution :
Inst. fur Inf., Bern Univ., Switzerland
Abstract :
We present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of tests that were used to generate forms, which subsequently were filled out by persons in their own handwriting. As of December 1998 the database includes 556 forms produced by approximately 250 different writers. The database consists of full English sentences. It could serve as a basis for a variety of handwriting recognition tasks. The main focus, however is on recognition techniques that use linguistic knowledge beyond the lexicon level. This knowledge can be automatically derived from the corpus or it can be supplied from external sources
Keywords :
computational linguistics; handwriting recognition; English sentence database; Lancaster-Oslo/Bergen corpus; linguistic knowledge; off-line handwriting recognition; preprocessing; text segmentation; Character recognition; Databases; Handwriting recognition; Informatics; Mathematics; NIST; Optical character recognition software; Read only memory; Speech recognition; Vocabulary;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791885