• DocumentCode
    1994024
  • Title

    Detection, extraction and representation of tables

  • Author

    Ramel, J.-Y. ; Crucianu, M. ; Vincent, N. ; Faure, C.

  • Author_Institution
    Lab. d´´Informatique, Univ. de Tours, France
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    374
  • Abstract
    We are concerned with the extraction of tables from exchange format representations of very diverse composite documents. We put forward a flexible representation scheme for complex tables, based on a clear distinction between the physical layout of a table and its logical structure. Relying on this scheme, we develop a new method for the detection and the extraction of tables by an analysis of the graphic lines. To deal with tables that lack all or most of the graphic marks, one must focus on the regularities of the text elements alone. We propose such a method, based on a multi-level analysis of the layout of text components on a page. A general graph representation of the relative positions of blocks of text is exploited.
  • Keywords
    computer graphics; electronic data interchange; table lookup; text analysis; PDF; Postscript; complex tables; diverse composite documents; exchange format representations; graph representation; graphic lines analysis; graphic marks; logical structure; multilevel analysis; physical layout; table detection; table extraction; tables representation; text blocks; text elements; Data mining; Focusing; Graphics; Image reconstruction; Layout; Page description languages; Text analysis; Visualization; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227692
  • Filename
    1227692