• DocumentCode
    3436714
  • Title

    Text extraction from Web images based on a split-and-merge segmentation method using colour perception

  • Author

    Karatzas, D. ; Antonacopoulos, A.

  • Author_Institution
    Dept. of Comput. Sci., Liverpool Univ., UK
  • Volume
    2
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    634
  • Abstract
    This paper describes a complete approach to the segmentation and extraction of text from Web images for subsequent recognition, to ultimately achieve both effective indexing and presentation by non-visual means (e.g., audio). The method described here (the first in the authors´ systematic approach to exploit human colour perception) enables the extraction of text in complex situations such as in the presence of varying colour (characters and background). More precisely, in addition to using structural features, the segmentation follows a split-and-merge strategy based on the hue-lightness-saturation (HLS) representation of colour as a first approximation of an anthropocentric expression of the differences in chromaticity and lightness. Character-like components are then extracted as forming textlines in a number of orientations and along curves.
  • Keywords
    Internet; image colour analysis; image segmentation; text analysis; Web images; anthropocentric expression; colour perception; hue-lightness-saturation colour representation; split-and-merge segmentation method; text extraction; HTML; Humans; Image color analysis; Image recognition; Image segmentation; Indexing; Pattern recognition; Search engines; Text recognition; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1334328
  • Filename
    1334328