DocumentCode
2631828
Title
A block segmentation method for document images with complicated column structures
Author
Hirayama, Yuki
Author_Institution
IBM Japan Ltd., Yamato city, Kanagawa, Japan
fYear
1993
fDate
20-22 Oct 1993
Firstpage
91
Lastpage
94
Abstract
Presents a novel block segmentation method for document images which can be applied to various document formats. Some documents have complicated column structures, in which some figure areas have no surrounding rectangles and others cut across text areas. In the approach presented, in order to segment documents into text and figure areas, the text areas are analyzed first, and the figure areas are then detected by analyzing information on the text areas. The overall process is as follows. First, character strings are merged into text groups by analyzing regularity in the text areas. Next, border lines of columns are detected by linking the edges of the text groups. After that, the whole page is segmented into small blocks according to the border lines. The blocks are then unified by using the column information, and some unified blocks are detected. Finally, a projection profile method is applied to the unified blocks in order to detect text areas and figure areas. This method was applied to 61 pages of Japanese technical papers and magazines, and 93.3% of the text areas and 93.2% of the figure areas were detected correctly
Keywords
document image processing; image segmentation; merging; Japanese magazines; Japanese technical papers; block segmentation method; block unification; border lines; character strings; complicated column structures; document formats; document images; figure areas; page segmentation; projection profile method; regularity; text areas; text groups; Cities and towns; Databases; Image analysis; Image edge detection; Image segmentation; Information analysis; Joining processes; Laboratories; Publishing; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location
Tsukuba Science City
Print_ISBN
0-8186-4960-7
Type
conf
DOI
10.1109/ICDAR.1993.395775
Filename
395775
Link To Document