• DocumentCode
    2144644
  • Title

    Table Content Understanding in SmartFIX

  • Author

    Deckert, S. ; Seidler, Benjamin ; Ebbecke, Markus ; Gillmann, Michael

  • Author_Institution
    Insiders Technol. GmbH, Kaiserslautern, Germany
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    488
  • Lastpage
    492
  • Abstract
    The analysis of table structures and the retrieval of table contents is widely agreed to be a difficult challenge in the area of document analysis systems. Instead of extracting the layout of tables, we are interested in understanding their content. In this paper, we present and discuss the smartFIX approach to table recognition and content extraction. Rather than relying on layout features only, we recognize tables by taking into account the presence and semantics of data entities that we expect to find contained in a table. The relationship of a document, including a table, to a specific business process aids in shaping helpful knowledge and expectations about the table´s content. smartFIX is a commercial document analysis system complying with the complete bandwidth of industrial requirements. Therefore, smartFIX must locate the tables and extract its business process relevant information with high reliability.
  • Keywords
    business data processing; content management; document handling; information retrieval; pattern recognition; business process aids; content extraction; data entities; document analysis systems; document capturing systems; layout features; semantics; smartFIX; table content understanding; table contents retrieval; table recognition; table structures analysis; Business; Databases; Layout; Measurement; Semantics; Text analysis; document analysis; smartFIX; table analysis; table content extraction; table recognition; table understanding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.104
  • Filename
    6065359