• DocumentCode
    3485654
  • Title

    Intellix -- End-User Trained Information Extraction for Document Archiving

  • Author

    Schuster, Daniel ; Muthmann, Klemens ; Esser, Dominik ; Schill, Alexander ; Berger, Marcel ; Weidling, Christoph ; Aliyev, Kamil ; Hofmeier, Andreas

  • Author_Institution
    Comput. Networks Group, Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    101
  • Lastpage
    105
  • Abstract
    Automatic information extraction from scanned business documents is especially valuable in the application domain of document archiving. But current systems for automated document processing still require a lot of configuration work that can only be done by experienced users or administrators. We present an approach for information extraction which purely builds on end-user provided training examples and intentionally omits efficient known extraction techniques like rule based extraction that require intense training and/or information extraction expertise. Our evaluation on a large corpus of business documents shows competitive results of above 85% F1-measure on 10 commonly used fields like document type, sender, receiver and date. The system is deployed and used inside the commercial document management system DocuWare.
  • Keywords
    business data processing; document handling; information retrieval; F1-measure; Intellix; automated document processing; commercial document management system DocuWare; document archiving; end-user trained information extraction; scanned business documents; Business; Data mining; Feature extraction; Information retrieval; Layout; Optical character recognition software; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.28
  • Filename
    6628593