Title :
A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures
Author :
Fang, Jing ; Gao, Liangcai ; Bai, Kun ; Qiu, Ruiheng ; Tao, Xin ; Tang, Zhi
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
Table detection is always an important task of document analysis and recognition. In this paper, we propose a novel and effective table detection method via visual separators and geometric content layout information, targeting at PDF documents. The visual separators refer to not only the graphic ruling lines but also the white spaces to handle tables with or without ruling lines. Furthermore, we detect page columns in order to assist table region delimitation in complex layout pages. Evaluations of our algorithm on an e-Book dataset and a scientific document dataset show competitive performance. It is noteworthy that the proposed method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.
Keywords :
document handling; electronic publishing; commercial software package; document analysis; document recognition; e-book dataset; geometric content layout information; multipage PDF document; page column detection; scientific document dataset; table detection method; table region delimitation; tabular structure; visual separator; Electronic publishing; Layout; Particle separators; Portable document format; Text analysis; White spaces; PDF documents; ruling lines; separators; table detection; table spotting;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.304