DocumentCode :
3485654
Title :
Intellix -- End-User Trained Information Extraction for Document Archiving
Author :
Schuster, Daniel ; Muthmann, Klemens ; Esser, Dominik ; Schill, Alexander ; Berger, Marcel ; Weidling, Christoph ; Aliyev, Kamil ; Hofmeier, Andreas
Author_Institution :
Comput. Networks Group, Tech. Univ. Dresden, Dresden, Germany
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
101
Lastpage :
105
Abstract :
Automatic information extraction from scanned business documents is especially valuable in the application domain of document archiving. But current systems for automated document processing still require a lot of configuration work that can only be done by experienced users or administrators. We present an approach for information extraction which purely builds on end-user provided training examples and intentionally omits efficient known extraction techniques like rule based extraction that require intense training and/or information extraction expertise. Our evaluation on a large corpus of business documents shows competitive results of above 85% F1-measure on 10 commonly used fields like document type, sender, receiver and date. The system is deployed and used inside the commercial document management system DocuWare.
Keywords :
business data processing; document handling; information retrieval; F1-measure; Intellix; automated document processing; commercial document management system DocuWare; document archiving; end-user trained information extraction; scanned business documents; Business; Data mining; Feature extraction; Information retrieval; Layout; Optical character recognition software; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.28
Filename :
6628593
Link To Document :
بازگشت