• DocumentCode
    3147921
  • Title

    Data Augmentation as a Service for Single View Creation

  • Author

    Nambiar, Ullas ; Faruquie, Tanveer A. ; Prasad, K. Hima ; Subramaniam, L. Venkata ; Mohania, Mukesh K.

  • Author_Institution
    IBM Res. - India, New Delhi, India
  • fYear
    2011
  • fDate
    4-9 July 2011
  • Firstpage
    40
  • Lastpage
    47
  • Abstract
    Businesses are increasingly realizing the value of creating a {it single view} of its customers and partners by integrating information residing in ´siloed´ datasets within and outside the enterprise. However, the task of {it augmenting} data available within the enterprise with data purchased from third-party providers or that residing in a public domain such as Web often results in warehouses that contain databases having incomplete and/or inconsistent data. Hence, before the data can become useful, one must eliminate the inconsistency in values appended to the enterprise data. In this paper, we present {it Data Augmentation as a service (DAaS)} that can help business in creating a consistent and usable single view of entities of interest. Specifically, our service will enable business rule writers to quickly create data augmentation rules by using our approximate functional dependency driven rule generation scheme. An accompanying challenge comes from having to manage a large number of rules and ensuring that new rules do not negate already existing rules. To mitigate this problem a rule-management and evaluation system that uses the Ripple Down Rules (RDR) framework is provided as part of our service. Using several large real-world datasets, we show our ability to learn rules for imputing attribute values with high accuracy and scalability necessary for enterprise users, how conflicts can arise within rules, and finally our ability to effectively handle those conflicts with high accuracy.
  • Keywords
    Web services; data handling; Wolrd Wide Web; approximate functional dependency driven rule generation scheme; data augmentation; databases; evaluation system; public domain; ripple down rules framework; rule-management; siloed datasets; single view creation; third-party providers; Approximation methods; Asynchronous transfer mode; Business; Data mining; Databases; Joining processes; Knowledge based systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing (SCC), 2011 IEEE International Conference on
  • Conference_Location
    Washington, DC
  • Print_ISBN
    978-1-4577-0863-3
  • Electronic_ISBN
    978-0-7695-4462-5
  • Type

    conf

  • DOI
    10.1109/SCC.2011.14
  • Filename
    6009242