DocumentCode :
2146120
Title :
A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures
Author :
Fang, Jing ; Gao, Liangcai ; Bai, Kun ; Qiu, Ruiheng ; Tao, Xin ; Tang, Zhi
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
779
Lastpage :
783
Abstract :
Table detection is always an important task of document analysis and recognition. In this paper, we propose a novel and effective table detection method via visual separators and geometric content layout information, targeting at PDF documents. The visual separators refer to not only the graphic ruling lines but also the white spaces to handle tables with or without ruling lines. Furthermore, we detect page columns in order to assist table region delimitation in complex layout pages. Evaluations of our algorithm on an e-Book dataset and a scientific document dataset show competitive performance. It is noteworthy that the proposed method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.
Keywords :
document handling; electronic publishing; commercial software package; document analysis; document recognition; e-book dataset; geometric content layout information; multipage PDF document; page column detection; scientific document dataset; table detection method; table region delimitation; tabular structure; visual separator; Electronic publishing; Layout; Particle separators; Portable document format; Text analysis; White spaces; PDF documents; ruling lines; separators; table detection; table spotting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.304
Filename :
6065417
Link To Document :
بازگشت