• DocumentCode
    1994302
  • Title

    Word searching in CCITT group 4 compressed document images

  • Author

    Lu, Yue ; Tan, Chew Lim

  • Author_Institution
    Comput. Sci. Dept., Singapore Nat. Univ., Kent Ridge, Singapore
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    467
  • Abstract
    In this paper, we present a compressed pattern matching method for searching user queried words in the CCITT Group 4 compressed document images, without decompressing. The feature pixels composed of black changing elements and white changing elements are extracted directly from the CCITT Group 4 compressed document images. The connected components are labeled based on a line-by-line strategy according to the relative positions between the changing elements of the current coding line and the changing elements of the reference line. Word boxes are bounded by merging the connected components. A two-stage matching strategy is constructed to measure the dissimilarity between the template image of the user´s query word and the words extracted from document images. Experimental results confirmed the validity of the proposed approach.
  • Keywords
    character recognition; document image processing; image coding; image matching; black changing elements; coding line; compressed document images; compressed pattern matching; line-by-line strategy; reference line; user queried words; white changing elements; word boxes; word searching; Computer science; Image coding; Image recognition; Image storage; Internet; Merging; Optical character recognition software; Pattern matching; Pixel; Software libraries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227709
  • Filename
    1227709