DocumentCode :
2144644
Title :
Table Content Understanding in SmartFIX
Author :
Deckert, S. ; Seidler, Benjamin ; Ebbecke, Markus ; Gillmann, Michael
Author_Institution :
Insiders Technol. GmbH, Kaiserslautern, Germany
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
488
Lastpage :
492
Abstract :
The analysis of table structures and the retrieval of table contents is widely agreed to be a difficult challenge in the area of document analysis systems. Instead of extracting the layout of tables, we are interested in understanding their content. In this paper, we present and discuss the smartFIX approach to table recognition and content extraction. Rather than relying on layout features only, we recognize tables by taking into account the presence and semantics of data entities that we expect to find contained in a table. The relationship of a document, including a table, to a specific business process aids in shaping helpful knowledge and expectations about the table´s content. smartFIX is a commercial document analysis system complying with the complete bandwidth of industrial requirements. Therefore, smartFIX must locate the tables and extract its business process relevant information with high reliability.
Keywords :
business data processing; content management; document handling; information retrieval; pattern recognition; business process aids; content extraction; data entities; document analysis systems; document capturing systems; layout features; semantics; smartFIX; table content understanding; table contents retrieval; table recognition; table structures analysis; Business; Databases; Layout; Measurement; Semantics; Text analysis; document analysis; smartFIX; table analysis; table content extraction; table recognition; table understanding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.104
Filename :
6065359
Link To Document :
بازگشت