DocumentCode
1994024
Title
Detection, extraction and representation of tables
Author
Ramel, J.-Y. ; Crucianu, M. ; Vincent, N. ; Faure, C.
Author_Institution
Lab. d´´Informatique, Univ. de Tours, France
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
374
Abstract
We are concerned with the extraction of tables from exchange format representations of very diverse composite documents. We put forward a flexible representation scheme for complex tables, based on a clear distinction between the physical layout of a table and its logical structure. Relying on this scheme, we develop a new method for the detection and the extraction of tables by an analysis of the graphic lines. To deal with tables that lack all or most of the graphic marks, one must focus on the regularities of the text elements alone. We propose such a method, based on a multi-level analysis of the layout of text components on a page. A general graph representation of the relative positions of blocks of text is exploited.
Keywords
computer graphics; electronic data interchange; table lookup; text analysis; PDF; Postscript; complex tables; diverse composite documents; exchange format representations; graph representation; graphic lines analysis; graphic marks; logical structure; multilevel analysis; physical layout; table detection; table extraction; tables representation; text blocks; text elements; Data mining; Focusing; Graphics; Image reconstruction; Layout; Page description languages; Text analysis; Visualization; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227692
Filename
1227692
Link To Document