Title :
Optical font recognition for multi-font OCR and document processing
Author :
La Manna, Serena ; Colia, A.M. ; Sperduti, Alessandro
Author_Institution :
Dip. di Inf., Pisa Univ., Italy
Abstract :
In this paper we present a multi-font OCR system to be employed for document processing, which performs, at the same time, both the character recognition and the font-style detection of the digits belonging to a subset of the existing fonts. The detection of the font-style of the document words can guide a rough automatic classification of documents, and can also be used to improve the character recognition. The system uses the tangent distance as a classification function in a nearest neighbour approach. We have to discriminate among different digits and, for the same character, we have to discriminate among different font-styles. The nearest neighbour approach is always able to recognize the digit, but the performance in font detection is not optimal. To improve the performance of the system, we have used a discriminant model, the TD-Neuron, which is employed to discriminate between two similar classes. Some experimental results and prospective use in document processing applications are presented
Keywords :
character sets; document image processing; optical character recognition; TD-Neuron; character recognition; document processing; document words; font-style detection; multi-font OCR; nearest neighbour approach; optical font recognition; tangent distance; Application software; Character recognition; Content based retrieval; Electronic mail; Humans; Indexing; Information retrieval; Natural languages; Neurons; Optical character recognition software;
Conference_Titel :
Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
Conference_Location :
Florence
Print_ISBN :
0-7695-0281-4
DOI :
10.1109/DEXA.1999.795244