DocumentCode :
2021184
Title :
Learning of Pattern-Based Rules for Document Classification
Author :
Dengel, Andreas R.
Author_Institution :
Univ. of Kaiserslautern, Kaiserslautern
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
123
Lastpage :
127
Abstract :
Automatic processing of office documents, such as orders, invoices, or offers entails a significant potential for saving costs. Because such domains have a high percentage of special vocabulary, purely statistical approaches fail in automatic classification. The inherent structure and short text messages require specific approaches. We propose a rule-based method to classify mixed stacks of documents into a set of hierarchically organized classes. Rules are learned by extracting patterns of different types from a document sample. The paper focuses on the architecture and on the learning process, presents comparing results to other techniques, and gives an outlook on how to further improve the system.
Keywords :
document image processing; image classification; knowledge based systems; learning (artificial intelligence); document classification; office documents; pattern-based rules; rule-based method; Cost function; Delay effects; Dispatching; Filtering; Optical character recognition software; Postal services; Routing; Text analysis; Vocabulary; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378688
Filename :
4378688
Link To Document :
بازگشت