DocumentCode :
2396351
Title :
Text-line extraction and character recognition of Japanese newspaper headlines with graphical designs
Author :
Sawaki, Minako ; Hagita, Norihiro
Author_Institution :
NTT Basic Res. Labs., Kanagawa, Japan
Volume :
3
fYear :
1996
fDate :
25-29 Aug 1996
Firstpage :
73
Abstract :
The conventional OCR fails to recognize most characters in Japanese newspaper headlines with graphical designs because of the difficulty of removing the designs. This paper proposes a method that recognizes such characters without removing the designs. First, text-line regions are extracted from a local distribution of the combination of black and white runs observed in a rectangular window while the window is shifted pixel-by-pixel in the direction of the text-line. Characters in the extracted text-line region are then recognized by displacement matching. Adaptive thresholding against the degree of degradation suppresses spurious candidates yielded by displacement matching even with graphical designs. Experimental results for fifty Japanese newspaper headlines show that the method achieves a recognition rate of 97.7%, much higher than a conventional method (17.0%)
Keywords :
document image processing; image segmentation; optical character recognition; Japanese newspaper headlines; adaptive thresholding; black and white runs; character recognition; degree of degradation; displacement matching; graphical designs; recognition rate; text-line extraction; Character recognition; Degradation; Design methodology; Image databases; Laboratories; Optical character recognition software; Optical devices; Pixel; Robustness; Software libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location :
Vienna
ISSN :
1051-4651
Print_ISBN :
0-8186-7282-X
Type :
conf
DOI :
10.1109/ICPR.1996.546797
Filename :
546797
Link To Document :
بازگشت