Title :
Supporting information extraction from printed documents by Lexico-Semantic pattern matching
Author_Institution :
German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
Abstract :
Document analysis and understanding (DAU) systems aim not only at the recognition of text and document structures but also at the extraction of relevant information out of a scanned document. Depending on the class of a document, information to be extracted may be defined in advance in syntactic structures as well as in semantic structures. In this paper we present a system for detecting such information and transforming it into a semantic representation. The basic component is a pattern matcher which incorporates geometric positions to detect phrases in the document. By defining a Levenshtein distance, the component reacts more generously in order to be error tolerant against OCR failures
Keywords :
image recognition; information retrieval; knowledge acquisition; pattern matching; Levenshtein distance; Lexico-Semantic pattern matching; document analysis and understanding systems; document structures; information extraction; printed documents; semantic representation; semantic structures; text recognition; Artificial intelligence; Data mining; Information analysis; Optical character recognition software; Pattern analysis; Pattern matching; Pattern recognition; Text analysis; Text recognition; Workflow management software;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.620605