Title :
Nested segmentation: an approach for layout analysis in document classification
Author :
Hao, Xiaolong ; Wang, Jason T L ; Ng, Peter A.
Author_Institution :
Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
Abstract :
Office information systems (OISs) are employed to support office workers in their management of information and to assist them in their daily work. In the OISs, document classification is one of the major functional capabilities. Classifying a document can be facilitated through the layout analysis of the document. A new approach to the layout analysis, called nested segmentation, is introduced. The layout relationships of components of a document are defined in terms of the adjacency of blocks. Given the adjacency of blocks, an adjacent block graph is introduced where the problem of the nested segmentation is transformed to a classic minimal cut problem for the graph. Also, an ordered labeled tree structure (L-S-Tree) is introduced to represent the segmented document for document classification
Keywords :
document handling; image segmentation; office automation; tree data structures; L-S-Tree; OISs; adjacent block graph; classic minimal cut problem; document classification; layout analysis; layout relationships; nested segmentation; office information systems; office workers; ordered labeled tree structure; segmented document; Classification tree analysis; Coordinate measuring machines; Electronic mail; Information analysis; Information management; Information science; Management information systems; Size measurement; Text analysis; Tree data structures;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395723