• DocumentCode
    2180116
  • Title

    Supporting information extraction from printed documents by Lexico-Semantic pattern matching

  • Author

    Wenzel, Claudia

  • Author_Institution
    German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
  • Volume
    2
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    732
  • Abstract
    Document analysis and understanding (DAU) systems aim not only at the recognition of text and document structures but also at the extraction of relevant information out of a scanned document. Depending on the class of a document, information to be extracted may be defined in advance in syntactic structures as well as in semantic structures. In this paper we present a system for detecting such information and transforming it into a semantic representation. The basic component is a pattern matcher which incorporates geometric positions to detect phrases in the document. By defining a Levenshtein distance, the component reacts more generously in order to be error tolerant against OCR failures
  • Keywords
    image recognition; information retrieval; knowledge acquisition; pattern matching; Levenshtein distance; Lexico-Semantic pattern matching; document analysis and understanding systems; document structures; information extraction; printed documents; semantic representation; semantic structures; text recognition; Artificial intelligence; Data mining; Information analysis; Optical character recognition software; Pattern analysis; Pattern matching; Pattern recognition; Text analysis; Text recognition; Workflow management software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.620605
  • Filename
    620605