DocumentCode
1738353
Title
Segmentation of a text printed in Korean and English using structure information and character recognizers
Author
Hwang, Young-Sup ; Moon, Kyung-Ae ; Chi, Su-Young ; Jang, Dae-Geun ; Oh, Weon-Geun
Author_Institution
Electron. & Telecommun. Res. Inst., Taejon, South Korea
Volume
3
fYear
2000
fDate
2000
Firstpage
1586
Abstract
The purpose of the research presented is to segment a text image printed in both Korean and English into character images, utilizing the structure information in Korean and English characters, and using a Korean, English and mixed language character recognizer. The image cannot be separated by only using the width and height of a character because those of an English character are not constant, contrary to those of a Korean character. Therefore we first classify the image into Korean or English using the structure information in Korean and English characters. If it is determined as a Korean character, we segment it with the average width of Korean characters in the text lines. If it is determined as an English character, we segment it using a classical method to segment touching alphanumeric characters. If it cannot be determined, we find possible cut points using a vertical histogram and use the mixed language recognizer to determine the right cut point. Since our method first classifies a block into Korean or English, it can be run faster than the traditional method that cannot identify the language. Each classified block can be segmented more accurately because more specific knowledge about Korean and English characters can be applied
Keywords
image segmentation; natural languages; optical character recognition; text analysis; English; Korean; average width; character images; character recognizers; classified block; cut points; mixed language character recognizer; structure information; text image segmentation; text lines; touching alphanumeric characters; vertical histogram; Character recognition; Computer vision; Histograms; Image processing; Image recognition; Image segmentation; Laboratories; Moon; Natural languages; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics, 2000 IEEE International Conference on
Conference_Location
Nashville, TN
ISSN
1062-922X
Print_ISBN
0-7803-6583-6
Type
conf
DOI
10.1109/ICSMC.2000.886248
Filename
886248
Link To Document