DocumentCode
3485654
Title
Intellix -- End-User Trained Information Extraction for Document Archiving
Author
Schuster, Daniel ; Muthmann, Klemens ; Esser, Dominik ; Schill, Alexander ; Berger, Marcel ; Weidling, Christoph ; Aliyev, Kamil ; Hofmeier, Andreas
Author_Institution
Comput. Networks Group, Tech. Univ. Dresden, Dresden, Germany
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
101
Lastpage
105
Abstract
Automatic information extraction from scanned business documents is especially valuable in the application domain of document archiving. But current systems for automated document processing still require a lot of configuration work that can only be done by experienced users or administrators. We present an approach for information extraction which purely builds on end-user provided training examples and intentionally omits efficient known extraction techniques like rule based extraction that require intense training and/or information extraction expertise. Our evaluation on a large corpus of business documents shows competitive results of above 85% F1-measure on 10 commonly used fields like document type, sender, receiver and date. The system is deployed and used inside the commercial document management system DocuWare.
Keywords
business data processing; document handling; information retrieval; F1-measure; Intellix; automated document processing; commercial document management system DocuWare; document archiving; end-user trained information extraction; scanned business documents; Business; Data mining; Feature extraction; Information retrieval; Layout; Optical character recognition software; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.28
Filename
6628593
Link To Document