Title :
A Mixed Approach for Handwritten Documents Structural Analysis
Author :
Malleron, Vincent ; Eglin, Véronique
Author_Institution :
LIRIS, Univ. de Lyon, Lyon, France
Abstract :
In this paper we propose a new method for document pages segmentation. First dedicated to handwritten documents, our method is designed to extract the different text zones, paragraph and fragment in unconstrained documents. The proposed approach is a mixed one, using both the advantages of top-down and bottom-up approaches. In this paper we proposed and evaluation of our methods on a 183 documents database, taken from a 19th century handwritten corpus : the "dossiers de Bouvard et Pécuchet" from Flaubert. With this evaluation we demonstrate that the combination of the top-down and the bottom-up approach allow to improve the obtained results.
Keywords :
document image processing; handwritten character recognition; image segmentation; text analysis; visual databases; bottom-up approach; document database; document page segmentation; handwritten corpus; handwritten document structural analysis; text zones; top-down approach; unconstraint documents; Algorithm design and analysis; Image segmentation; Layout; Text analysis; Transforms; White spaces; handwritten; layout segmentation; logical structure; physical structure;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.62