DocumentCode
3023907
Title
A model for detecting and merging vertically spanned table cells in plain text documents
Author
Long, Vanessa ; Dale, Robert ; Cassidy, Steve
Author_Institution
Centre for Language Technol., Macquarie Univ., Sydney, NSW, Australia
fYear
2005
fDate
29 Aug.-1 Sept. 2005
Firstpage
1242
Abstract
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.
Keywords
text analysis; document semantic understanding; free text documents; plain text documents; vertically spanned table cell detection; vertically spanned table cell merging; Australia; Data mining; IEEE news; Merging; Robustness; Stock markets; Terminology; Testing; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
ISSN
1520-5263
Print_ISBN
0-7695-2420-6
Type
conf
DOI
10.1109/ICDAR.2005.21
Filename
1575741
Link To Document