DocumentCode
3211005
Title
Understanding multi-articled documents
Author
Tsujimoto, Shuichi ; Asada, Haruo
Author_Institution
Toshiba Corp., Kawasaki, Japan
Volume
i
fYear
1990
fDate
16-21 Jun 1990
Firstpage
551
Abstract
A document understanding method based on the tree representation of document structures is proposed. It is shown that documents have an obvious hierarchical structure in their geometry which is represented by a tree. A small number of rules are introduced to transform the geometric structure into the logical structure which represents the semantics. The virtual field separator technique is employed to utilize the information carried by special constituents of documents such as field separators and frames, keeping the number of transformation rules small. Experimental results on a variety of document formats have shown that the proposed method is applicable to most of the documents commonly encountered in daily use, although there is still room for further refinement of the transformation rules
Keywords
document image processing; pattern recognition; trees (mathematics); document understanding; geometric structure; logical structure; pattern recognition; semantics; tree representation; virtual field separator; Abstracts; Desktop publishing; Humans; Image analysis; Natural languages; Particle separators; Research and development; Sections; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 1990. Proceedings., 10th International Conference on
Conference_Location
Atlantic City, NJ
Print_ISBN
0-8186-2062-5
Type
conf
DOI
10.1109/ICPR.1990.118163
Filename
118163
Link To Document