Title :
Background variability modeling for statistical layout analysis
Author :
Shafait, Faisal ; Van Beusekom, Joost ; Keysers, Daniel ; Breuel, Thomas M.
Author_Institution :
Image Understanding & Pattern Recognition (IUPR) Res. Group, German Res. Center for Artificial Intell. (DFKI), Kaiserslautern, Germany
Abstract :
Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achieving high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and fail when their underlying assumptions are not met. Also, they do not provide confidence scores for their output. These two problems limit the usefulness of general purpose layout analysis methods in large scale applications. In this contribution, we propose a statistically motivated model-based trainable layout analysis system that allows assumption-free adaptation to different layout types and produces likelihood estimates of the correctness of the computed page segmentation. The performance of our approach is tested on a subset of the Google 1000 books dataset where it achieved a text line segmentation accuracy of 98.4% on layouts where other general-purpose algorithms failed to do a correct segmentation.
Keywords :
document image processing; image segmentation; statistical analysis; text analysis; UW-III dataset; background variability modeling; document image; document layouts; page segmentation; statistical layout analysis; text line segmentation; Artificial intelligence; Image analysis; Image segmentation; Large-scale systems; Pattern analysis; Pattern recognition; Solid modeling; Stochastic processes; Testing; Training data;
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
DOI :
10.1109/ICPR.2008.4760964