Title :
Detection, extraction and representation of tables
Author :
Ramel, J.-Y. ; Crucianu, M. ; Vincent, N. ; Faure, C.
Author_Institution :
Lab. d´´Informatique, Univ. de Tours, France
Abstract :
We are concerned with the extraction of tables from exchange format representations of very diverse composite documents. We put forward a flexible representation scheme for complex tables, based on a clear distinction between the physical layout of a table and its logical structure. Relying on this scheme, we develop a new method for the detection and the extraction of tables by an analysis of the graphic lines. To deal with tables that lack all or most of the graphic marks, one must focus on the regularities of the text elements alone. We propose such a method, based on a multi-level analysis of the layout of text components on a page. A general graph representation of the relative positions of blocks of text is exploited.
Keywords :
computer graphics; electronic data interchange; table lookup; text analysis; PDF; Postscript; complex tables; diverse composite documents; exchange format representations; graph representation; graphic lines analysis; graphic marks; logical structure; multilevel analysis; physical layout; table detection; table extraction; tables representation; text blocks; text elements; Data mining; Focusing; Graphics; Image reconstruction; Layout; Page description languages; Text analysis; Visualization; XML;
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
DOI :
10.1109/ICDAR.2003.1227692