DocumentCode :
1738353
Title :
Segmentation of a text printed in Korean and English using structure information and character recognizers
Author :
Hwang, Young-Sup ; Moon, Kyung-Ae ; Chi, Su-Young ; Jang, Dae-Geun ; Oh, Weon-Geun
Author_Institution :
Electron. & Telecommun. Res. Inst., Taejon, South Korea
Volume :
3
fYear :
2000
fDate :
2000
Firstpage :
1586
Abstract :
The purpose of the research presented is to segment a text image printed in both Korean and English into character images, utilizing the structure information in Korean and English characters, and using a Korean, English and mixed language character recognizer. The image cannot be separated by only using the width and height of a character because those of an English character are not constant, contrary to those of a Korean character. Therefore we first classify the image into Korean or English using the structure information in Korean and English characters. If it is determined as a Korean character, we segment it with the average width of Korean characters in the text lines. If it is determined as an English character, we segment it using a classical method to segment touching alphanumeric characters. If it cannot be determined, we find possible cut points using a vertical histogram and use the mixed language recognizer to determine the right cut point. Since our method first classifies a block into Korean or English, it can be run faster than the traditional method that cannot identify the language. Each classified block can be segmented more accurately because more specific knowledge about Korean and English characters can be applied
Keywords :
image segmentation; natural languages; optical character recognition; text analysis; English; Korean; average width; character images; character recognizers; classified block; cut points; mixed language character recognizer; structure information; text image segmentation; text lines; touching alphanumeric characters; vertical histogram; Character recognition; Computer vision; Histograms; Image processing; Image recognition; Image segmentation; Laboratories; Moon; Natural languages; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 2000 IEEE International Conference on
Conference_Location :
Nashville, TN
ISSN :
1062-922X
Print_ISBN :
0-7803-6583-6
Type :
conf
DOI :
10.1109/ICSMC.2000.886248
Filename :
886248
Link To Document :
بازگشت