Title :
Segmentation of complex documents multilevel images: a robust and fast text bodies-headers detection and extraction scheme
Author :
Olivier, Déforges ; Dominique, Barba
Author_Institution :
S.E.I. Lab., IRESTE, Nantes, France
Abstract :
We present a method for segmenting multilevels images of documents. The documents are considered difficult ones in the sense they may contain text paragraphs with different orientations and shapes, mixed with graphics and photographs. The proposed method extracts and separates blocks of text lines (printed or handwritten characters) and headers as well as stroke structures. The generic approach is first based on a multiscale analysis with the use of a pyramid representation of the image. At each level, text location is performed by a line borders detection scheme. Then, an efficient bottom-up procedure generates bodies (text paragraphs) as the output of algebric transformations upon a set of four directed graphs associated with the topological relationships of physical components
Keywords :
directed graphs; document image processing; feature extraction; image segmentation; algebric transformations; bottom-up procedure; complex documents multilevel images segmentation; directed graphs; line borders detection scheme; multiscale analysis; pyramid representation; stroke structures; text bodies-headers detection; text bodies-headers extraction; text paragraphs; Bridges; Data mining; Image segmentation; Merging; Performance analysis; Robustness; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.602016