• DocumentCode
    2472103
  • Title

    Background variability modeling for statistical layout analysis

  • Author

    Shafait, Faisal ; Van Beusekom, Joost ; Keysers, Daniel ; Breuel, Thomas M.

  • Author_Institution
    Image Understanding & Pattern Recognition (IUPR) Res. Group, German Res. Center for Artificial Intell. (DFKI), Kaiserslautern, Germany
  • fYear
    2008
  • fDate
    8-11 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achieving high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and fail when their underlying assumptions are not met. Also, they do not provide confidence scores for their output. These two problems limit the usefulness of general purpose layout analysis methods in large scale applications. In this contribution, we propose a statistically motivated model-based trainable layout analysis system that allows assumption-free adaptation to different layout types and produces likelihood estimates of the correctness of the computed page segmentation. The performance of our approach is tested on a subset of the Google 1000 books dataset where it achieved a text line segmentation accuracy of 98.4% on layouts where other general-purpose algorithms failed to do a correct segmentation.
  • Keywords
    document image processing; image segmentation; statistical analysis; text analysis; UW-III dataset; background variability modeling; document image; document layouts; page segmentation; statistical layout analysis; text line segmentation; Artificial intelligence; Image analysis; Image segmentation; Large-scale systems; Pattern analysis; Pattern recognition; Solid modeling; Stochastic processes; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
  • Conference_Location
    Tampa, FL
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-2174-9
  • Electronic_ISBN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2008.4760964
  • Filename
    4760964