• DocumentCode
    595365
  • Title

    Character extraction in web image for text recognition

  • Author

    Bolan Su ; Shijian Lu ; Trung Quy Phan ; Chew Lim Tan

  • Author_Institution
    Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    3042
  • Lastpage
    3045
  • Abstract
    Images with text are frequently used on Internet for different purposes. Automatic recognition of text from web images plays an important role on extraction and retrieval of web information. However, the web images are usually in low resolution with artifacts and special effects, which makes word recognition a challenge task even after the text has been localized. In this paper, we propose a robust text recognition technique to efficiently convert the web images into text format. The proposed technique first makes use of the L0 norm smoothing to increase the edge contrast of the input web images. The images are then binarized on each color channel. A connected component analysis is followed to identify the possible character components. Finally the character candidates are recognized by the OCR engine after skew correction. Extensive experiments have been conducted on the latest ICDAR 2011 robust reading competition dataset for born-digital text. The experimental results show the superior performance of our proposed technique.
  • Keywords
    Internet; edge detection; image colour analysis; image retrieval; optical character recognition; smoothing methods; text analysis; text detection; ICDAR 2011 robust reading competition dataset; Internet; L0 norm smoothing; OCR engine; Web image; Web information extraction; Web information retrieval; automatic text recognition; born-digital text; character component identification; character extraction; color channel; connected component analysis; edge contrast; image binarization; low resolution images; robust text recognition technique; skew correction; text format; text localization; word recognition; Image color analysis; Image recognition; Optical character recognition software; Robustness; Smoothing methods; Testing; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460806