• DocumentCode
    3211005
  • Title

    Understanding multi-articled documents

  • Author

    Tsujimoto, Shuichi ; Asada, Haruo

  • Author_Institution
    Toshiba Corp., Kawasaki, Japan
  • Volume
    i
  • fYear
    1990
  • fDate
    16-21 Jun 1990
  • Firstpage
    551
  • Abstract
    A document understanding method based on the tree representation of document structures is proposed. It is shown that documents have an obvious hierarchical structure in their geometry which is represented by a tree. A small number of rules are introduced to transform the geometric structure into the logical structure which represents the semantics. The virtual field separator technique is employed to utilize the information carried by special constituents of documents such as field separators and frames, keeping the number of transformation rules small. Experimental results on a variety of document formats have shown that the proposed method is applicable to most of the documents commonly encountered in daily use, although there is still room for further refinement of the transformation rules
  • Keywords
    document image processing; pattern recognition; trees (mathematics); document understanding; geometric structure; logical structure; pattern recognition; semantics; tree representation; virtual field separator; Abstracts; Desktop publishing; Humans; Image analysis; Natural languages; Particle separators; Research and development; Sections; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 1990. Proceedings., 10th International Conference on
  • Conference_Location
    Atlantic City, NJ
  • Print_ISBN
    0-8186-2062-5
  • Type

    conf

  • DOI
    10.1109/ICPR.1990.118163
  • Filename
    118163