• DocumentCode
    384306
  • Title

    Word segmentation of printed text lines based on gap clustering and special symbol detection

  • Author

    Kim, Soo H. ; Jeong, Chang B. ; Kwag, Hee K. ; Suen, Ching Y.

  • Author_Institution
    Dept. of Comput. Sci., Chonnam Nat. Univ., Kwangju, South Korea
  • Volume
    2
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    320
  • Abstract
    This paper proposes a word segmentation method for machine-printed text lines. It utilizes gaps and special symbols as delimiters between words. A gap clustering technique is used to identify the gaps between words regardless of the gap-size variations among different document images. Next a special symbol detection technique is applied to find two types of special symbols lying between words. An experiment with 1,675 text lines in 100 different English and Korean documents shows that the proposed method achieves a high accuracy of word segmentation.
  • Keywords
    character recognition; image segmentation; English documents; Korean documents; delimiters; gap clustering; gap clustering technique; machine-printed text lines; printed text lines; symbol detection; symbol detection technique; word segmentation; word segmentation method; Artificial intelligence; Computer science; Document image processing; Image segmentation; Machine intelligence; Optical character recognition software; Optical devices; Pattern recognition; Size measurement; White spaces;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2002. Proceedings. 16th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-1695-X
  • Type

    conf

  • DOI
    10.1109/ICPR.2002.1048304
  • Filename
    1048304