Title :
Automated analysis of line plots in documents
Author :
Rathin Radhakrishnan Nair;Nishant Sankaran;Ifeoma Nwogu;Venu Govindaraju
Author_Institution :
Department of Computer Science and Engineering, University at Buffalo, NY 14260-1660, USA
Abstract :
Information graphics, such as graphs and plots, are used in technical documents to convey information to humans and to facilitate greater understanding. Usually, graphics are a key component in a technical document, as they enable the author to convey complex ideas in a simplified visual format. However, in an automatic text recognition system, which are typically used to digitize documents, the ideas conveyed in a graphical format are lost. We contend that the message or extracted information can be used to help better understand the ideas conveyed in the document. In scientific papers, line plots are the most commonly used graphic to represent experimental results in the form of correlation present between values represented on the axes. The contribution of our work is in the series of image processing algorithms that are used to automatically extract relevant information, including text and plot from graphics found in technical documents. We validate the approach by performing the experiments on a dataset of line plots obtained from scientific documents from computer science conference papers and evaluate the variation of a reconstructed curve from the original curve. Our algorithm achieves a classification accuracy of 91% across the dataset and successfully extracts the axes from 92% of line plots. Axes label extraction and line curve tracing are performed successfully in about half the line plots as well.
Keywords :
"Three-dimensional displays","Accuracy","Image color analysis"
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
DOI :
10.1109/ICDAR.2015.7333871