Title :
Learning to Detect Tables in Scanned Document Images Using Line Information
Author :
Kasar, T. ; Barlas, Panagiotis ; Adam, S. ; Chatelain, C. ; Paquet, T.
Author_Institution :
Lab. LITIS-EA 4108, Univ. de Rouen, Rouen, France
Abstract :
This paper presents a method to detect table regions in document images by identifying the column and row line-separators and their properties. The method employs a run-length approach to identify the horizontal and vertical lines present in the input image. From each group of intersecting horizontal and vertical lines, a set of 26 low-level features are extracted and an SVM classifier is used to test if it belongs to a table or not. The performance of the method is evaluated on a heterogeneous corpus of French, English and Arabic documents that contain various types of table structures and compared with that of the Tesseract OCR system.
Keywords :
document image processing; feature extraction; image classification; support vector machines; Arabic documents; English documents; French documents; SVM classifier; column line-separator identification; heterogeneous corpus; horizontal line identification; horizontal line-vertical line intersection; input image; line information; low-level feature extraction; performance evaluation; row line-separator identification; run-length approach; scanned document images; table region detection; table structures; vertical line identification; Detectors; Feature extraction; Image segmentation; Layout; Measurement; Optical character recognition software; Support vector machines; Table detection; line detection;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.240