Title :
Document Information Extraction and Its Evaluation Based on Client´s Relevance
Author :
Santosh, K.C. ; Belaid, Abdel
Author_Institution :
LORIA, Univ. de Lorraine, Nancy, France
Abstract :
In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients´ relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
Keywords :
document image processing; graph theory; image retrieval; ARG; attributed relational graph; client relevance; document image; dynamic semantics; graph mining; model-based document information content extraction approach; real-world industrial problem; Computational modeling; Data mining; Feature extraction; Measurement; Optical character recognition software; Semantics; Vectors; Document information exploitation; graph mining; table extraction;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.16