Title :
Segmentation of a text printed in Korean and English using structure information and character recognizers
Author :
Hwang, Young-Sup ; Moon, Kyung-Ae ; Chi, Su-Young ; Jang, Dae-Geun ; Oh, Weon-Geun
Author_Institution :
Electron. & Telecommun. Res. Inst., Taejon, South Korea
Abstract :
The purpose of the research presented is to segment a text image printed in both Korean and English into character images, utilizing the structure information in Korean and English characters, and using a Korean, English and mixed language character recognizer. The image cannot be separated by only using the width and height of a character because those of an English character are not constant, contrary to those of a Korean character. Therefore we first classify the image into Korean or English using the structure information in Korean and English characters. If it is determined as a Korean character, we segment it with the average width of Korean characters in the text lines. If it is determined as an English character, we segment it using a classical method to segment touching alphanumeric characters. If it cannot be determined, we find possible cut points using a vertical histogram and use the mixed language recognizer to determine the right cut point. Since our method first classifies a block into Korean or English, it can be run faster than the traditional method that cannot identify the language. Each classified block can be segmented more accurately because more specific knowledge about Korean and English characters can be applied
Keywords :
image segmentation; natural languages; optical character recognition; text analysis; English; Korean; average width; character images; character recognizers; classified block; cut points; mixed language character recognizer; structure information; text image segmentation; text lines; touching alphanumeric characters; vertical histogram; Character recognition; Computer vision; Histograms; Image processing; Image recognition; Image segmentation; Laboratories; Moon; Natural languages; Text recognition;
Conference_Titel :
Systems, Man, and Cybernetics, 2000 IEEE International Conference on
Conference_Location :
Nashville, TN
Print_ISBN :
0-7803-6583-6
DOI :
10.1109/ICSMC.2000.886248