Title :
A Distance-Based Technique for Non-Manhattan Layout Analysis
Author :
Ferilli, Stefano ; Biba, Marenglen ; Esposito, Floriana ; Basile, Teresa M A
Author_Institution :
Dipt. di Inf., Univ. degli Studi di Bari, Bari, Italy
Abstract :
Layout analysis is a fundamental step in automatic document processing. Many different techniques have been proposed to perform this task. Some follow a top-down approach: they start by identifying the high level components of the page structure and then recursively split them until basic blocks are found. On the other hand, bottom-up approaches start with the smallest elements (e.g., the pixels in case of digitized document) and then recursively merge them into higher level components. A first limitation of such methods is that most of them are designed to deal only with digitized documents and hence are not applicable to native digital documents which are nowadays pervasive. Furthermore, top-down and most of bottom-up methods are able to process Manhattan layout documents only. In this work, we propose a general bottom-up strategy to tackle the layout analysis of (possibly) non-Manhattan documents, and two specializations of it to handle both bitmap and PS/PDF sources. It was successfully embedded and tested in the DOMINUS document management system.
Keywords :
document image processing; DOMINUS document management system; PS/PDF image; automatic document image processing; bottom-up approach; digital document image; distance-based technique; nonmanhattan layout analysis; page structure; top-down approach; System testing; Document Image Understanding; Layout Analysis;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.37