DocumentCode :
311136
Title :
Experiments on extracting structural information from paper documents using syntactic pattern analysis
Author :
Bayer, T.A. ; Walischewski, H.
Author_Institution :
Daimler-Benz AG, Ulm, Germany
Volume :
1
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
476
Abstract :
Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a semantic net, which describes geometric properties, spatial relationships, lexical entities as well as lexical relationships. The document model is used to extract the sender, date, recipient, opening and closing formula from a business letter. 181 business letters have been processed, divided into a training set of 20 and the remaining ones for testing. The error rates for the test set range from 0.022 to 0.049 by an average rejection rate of 0.4. Results show that the computational effort can be limited to O(n2) given n primitive objects for matching
Keywords :
document image processing; knowledge acquisition; pattern recognition; semantic networks; daily document processing; document topics; error rates; geometric properties; index terms; lexical entities; lexical relationships; paper documents; primitive objects; semantic net; spatial relationships; structural information; syntactic pattern analysis; Artificial intelligence; Data mining; Electronics packaging; Error analysis; Humans; Information analysis; Optical character recognition software; Pattern analysis; Testing; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.599039
Filename :
599039
Link To Document :
بازگشت