Title :
Document analysis in gray level and typography extraction using character pattern redundancies
Author :
LeBourgeois, F. ; Emptoz, H.
Author_Institution :
Lab. de Reconnaissance de Formes et Vision, Inst. Nat. des Sci. Appliquees, Villeurbanne, France
Abstract :
The paper describes the processing of magazine or newspaper images which need to be segmented in gray level. The first part proposes an original method to extract the physical layout of gray-level documents. The second part of the paper describes a rough logical structure by analyzing the typography, aiming to extract relevant information about the logical layout by combining information about colors, typography, and the physical structural layout for use by an automatic document indexation system. Character prototypes were automatically extracted by grouping characters which have the same binary patterns. We suggest using this character-grouping method to extract typographical information and recognize different font styles and sizes used in the document
Keywords :
character recognition; character sets; document image processing; feature extraction; image segmentation; indexing; automatic document indexation system; binary patterns; character pattern redundancies; character prototypes; character-grouping method; document analysis; font styles; gray level; gray-level documents; image segmentation; logical layout; magazine; newspaper images; physical layout; physical structural layout; relevant information extraction; rough logical structure; typographical information; typography extraction; Color; Data mining; Identity-based encryption; Image segmentation; Optical character recognition software; Pattern analysis; Prototypes; Reconnaissance; Text analysis; Voting;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791753