DocumentCode
3485457
Title
Document Information Extraction and Its Evaluation Based on Client´s Relevance
Author
Santosh, K.C. ; Belaid, Abdel
Author_Institution
LORIA, Univ. de Lorraine, Nancy, France
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
35
Lastpage
39
Abstract
In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients´ relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
Keywords
document image processing; graph theory; image retrieval; ARG; attributed relational graph; client relevance; document image; dynamic semantics; graph mining; model-based document information content extraction approach; real-world industrial problem; Computational modeling; Data mining; Feature extraction; Measurement; Optical character recognition software; Semantics; Vectors; Document information exploitation; graph mining; table extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.16
Filename
6628581
Link To Document