• DocumentCode
    1738353
  • Title

    Segmentation of a text printed in Korean and English using structure information and character recognizers

  • Author

    Hwang, Young-Sup ; Moon, Kyung-Ae ; Chi, Su-Young ; Jang, Dae-Geun ; Oh, Weon-Geun

  • Author_Institution
    Electron. & Telecommun. Res. Inst., Taejon, South Korea
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1586
  • Abstract
    The purpose of the research presented is to segment a text image printed in both Korean and English into character images, utilizing the structure information in Korean and English characters, and using a Korean, English and mixed language character recognizer. The image cannot be separated by only using the width and height of a character because those of an English character are not constant, contrary to those of a Korean character. Therefore we first classify the image into Korean or English using the structure information in Korean and English characters. If it is determined as a Korean character, we segment it with the average width of Korean characters in the text lines. If it is determined as an English character, we segment it using a classical method to segment touching alphanumeric characters. If it cannot be determined, we find possible cut points using a vertical histogram and use the mixed language recognizer to determine the right cut point. Since our method first classifies a block into Korean or English, it can be run faster than the traditional method that cannot identify the language. Each classified block can be segmented more accurately because more specific knowledge about Korean and English characters can be applied
  • Keywords
    image segmentation; natural languages; optical character recognition; text analysis; English; Korean; average width; character images; character recognizers; classified block; cut points; mixed language character recognizer; structure information; text image segmentation; text lines; touching alphanumeric characters; vertical histogram; Character recognition; Computer vision; Histograms; Image processing; Image recognition; Image segmentation; Laboratories; Moon; Natural languages; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 2000 IEEE International Conference on
  • Conference_Location
    Nashville, TN
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-6583-6
  • Type

    conf

  • DOI
    10.1109/ICSMC.2000.886248
  • Filename
    886248