• DocumentCode
    2337877
  • Title

    Robust recognition of complex entities in text exploiting enterprise data and NLP-techniques

  • Author

    Brauer, Falk ; Schramm, Marcus ; Barczynski, Wojciech ; Löser, Alexander ; Do, Hong-Hai

  • Author_Institution
    SAP Res., SAP AG, Dresden
  • fYear
    2008
  • fDate
    13-16 Nov. 2008
  • Firstpage
    551
  • Lastpage
    558
  • Abstract
    Data transactions between business partners often include unstructured data such as invoices or purchase orders. In order to process such automatically, complex business entities need to be identified. Examples for complex entities are products, business partners and purchase orders which are stored in a supplier relationship management system. Both, structured records in the enterprise system and text data, describe these complex entities. A major challenge is to correctly associate entities recognized in unstructured data with entities stored in structured data, e.g. enterprise databases. We address that problem and propose a robust process methodology which includes three phases: candidate extraction from unstructured text, generation of initial mappings with structured data and disambiguation of the mappings exploiting relationships among the entities in the enterprise data and the documentspsila structure. We describe each step in detail, propose a common architecture and introduce to our data model and algorithms.
  • Keywords
    business data processing; database management systems; text analysis; NLP-techniques; data transactions; enterprise data; enterprise databases; robust recognition; supplier relationship management system; text data; Costs; Current supplies; Data mining; Data models; Databases; Identity management systems; Robustness; Supply chain management; Supply chains; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2008. ICDIM 2008. Third International Conference on
  • Conference_Location
    London
  • Print_ISBN
    978-1-4244-2916-5
  • Electronic_ISBN
    978-1-4244-2917-2
  • Type

    conf

  • DOI
    10.1109/ICDIM.2008.4746780
  • Filename
    4746780