• DocumentCode
    3485457
  • Title

    Document Information Extraction and Its Evaluation Based on Client´s Relevance

  • Author

    Santosh, K.C. ; Belaid, Abdel

  • Author_Institution
    LORIA, Univ. de Lorraine, Nancy, France
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    35
  • Lastpage
    39
  • Abstract
    In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients´ relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
  • Keywords
    document image processing; graph theory; image retrieval; ARG; attributed relational graph; client relevance; document image; dynamic semantics; graph mining; model-based document information content extraction approach; real-world industrial problem; Computational modeling; Data mining; Feature extraction; Measurement; Optical character recognition software; Semantics; Vectors; Document information exploitation; graph mining; table extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.16
  • Filename
    6628581