• DocumentCode
    1479198
  • Title

    Scanned Compound Document Encoding Using Multiscale Recurrent Patterns

  • Author

    Francisco, Nelson C. ; Rodrigues, Nuno M M ; da Silva, E.A.B. ; De Carvalho, Murilo Bresciani ; De Faria, Sérgio M M ; Silva, Eduardo A B da

  • Author_Institution
    Inst. de Telecomun., Leiria, Portugal
  • Volume
    19
  • Issue
    10
  • fYear
    2010
  • Firstpage
    2712
  • Lastpage
    2724
  • Abstract
    In this paper, we propose a new encoder for scanned compound documents, based upon a recently introduced coding paradigm called multidimensional multiscale parser (MMP). MMP uses approximate pattern matching, with adaptive multiscale dictionaries that contain concatenations of scaled versions of previously encoded image blocks. These features give MMP the ability to adjust to the input image´s characteristics, resulting in high coding efficiencies for a wide range of image types. This versatility makes MMP a good candidate for compound digital document encoding. The proposed algorithm first classifies the image blocks as smooth (texture) and nonsmooth (text and graphics). Smooth and nonsmooth blocks are then compressed using different MMP-based encoders, adapted for encoding either type of blocks. The adaptive use of these two types of encoders resulted in performance gains over the original MMP algorithm, further increasing the performance advantage over the current state-of-the-art image encoders for scanned compound images, without compromising the performance for other image types.
  • Keywords
    document image processing; encoding; grammars; image coding; adaptive multiscale dictionaries; approximate pattern matching; concatenations; encoded image blocks; multidimensional multiscale parser; multiscale recurrent patterns; scanned compound document encoding; Adaptive pattern matching; compound images; dictionary based coding; image coding; scanned document compression; vector quantization;
  • fLanguage
    English
  • Journal_Title
    Image Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1057-7149
  • Type

    jour

  • DOI
    10.1109/TIP.2010.2049181
  • Filename
    5454328