DocumentCode
2180116
Title
Supporting information extraction from printed documents by Lexico-Semantic pattern matching
Author
Wenzel, Claudia
Author_Institution
German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
Volume
2
fYear
1997
fDate
18-20 Aug 1997
Firstpage
732
Abstract
Document analysis and understanding (DAU) systems aim not only at the recognition of text and document structures but also at the extraction of relevant information out of a scanned document. Depending on the class of a document, information to be extracted may be defined in advance in syntactic structures as well as in semantic structures. In this paper we present a system for detecting such information and transforming it into a semantic representation. The basic component is a pattern matcher which incorporates geometric positions to detect phrases in the document. By defining a Levenshtein distance, the component reacts more generously in order to be error tolerant against OCR failures
Keywords
image recognition; information retrieval; knowledge acquisition; pattern matching; Levenshtein distance; Lexico-Semantic pattern matching; document analysis and understanding systems; document structures; information extraction; printed documents; semantic representation; semantic structures; text recognition; Artificial intelligence; Data mining; Information analysis; Optical character recognition software; Pattern analysis; Pattern matching; Pattern recognition; Text analysis; Text recognition; Workflow management software;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location
Ulm
Print_ISBN
0-8186-7898-4
Type
conf
DOI
10.1109/ICDAR.1997.620605
Filename
620605
Link To Document