Title :
Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation
Author :
Fang, Jing ; Tao, Xin ; Tang, Zhi ; Qiu, Ruiheng ; Liu, Ying
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
Table detection is an important task in the field of document analysis. It has been extensively studied since a couple of decades. Various kinds of document mediums are involved, from scanned images to web pages, from plain texts to PDF files. Numerous algorithms published bring up a challenging issue: how to evaluate algorithms in different context. Currently, most work on table detection conducts experiments on their in-house dataset. Even the few sources of online datasets are targeted at image documents only. Moreover, Precision and recall measurement are usual practice in order to account performance based on human evaluation. In this paper, we provide a dataset that is representative, large and most importantly, publicly available. The compatible format of the ground truth makes evaluation independent of document medium. We also propose a set of new measures, implement them, and open the source code. Finally, three existing table detection algorithms are evaluated to demonstrate the reliability of the dataset and metrics.
Keywords :
document image processing; performance evaluation; source coding; text detection; PDF files; Web pages; document analysis; document medium; ground truth; image documents; online datasets; performance metrics; plain texts; scanned images; source code; table detection evaluation; Algorithm design and analysis; Benchmark testing; Detection algorithms; Layout; Measurement; Portable document format; Text analysis; dataset; ground-truth; performance evaluation; performance metrics; table detection;
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
DOI :
10.1109/DAS.2012.29