DocumentCode
3147921
Title
Data Augmentation as a Service for Single View Creation
Author
Nambiar, Ullas ; Faruquie, Tanveer A. ; Prasad, K. Hima ; Subramaniam, L. Venkata ; Mohania, Mukesh K.
Author_Institution
IBM Res. - India, New Delhi, India
fYear
2011
fDate
4-9 July 2011
Firstpage
40
Lastpage
47
Abstract
Businesses are increasingly realizing the value of creating a {it single view} of its customers and partners by integrating information residing in ´siloed´ datasets within and outside the enterprise. However, the task of {it augmenting} data available within the enterprise with data purchased from third-party providers or that residing in a public domain such as Web often results in warehouses that contain databases having incomplete and/or inconsistent data. Hence, before the data can become useful, one must eliminate the inconsistency in values appended to the enterprise data. In this paper, we present {it Data Augmentation as a service (DAaS)} that can help business in creating a consistent and usable single view of entities of interest. Specifically, our service will enable business rule writers to quickly create data augmentation rules by using our approximate functional dependency driven rule generation scheme. An accompanying challenge comes from having to manage a large number of rules and ensuring that new rules do not negate already existing rules. To mitigate this problem a rule-management and evaluation system that uses the Ripple Down Rules (RDR) framework is provided as part of our service. Using several large real-world datasets, we show our ability to learn rules for imputing attribute values with high accuracy and scalability necessary for enterprise users, how conflicts can arise within rules, and finally our ability to effectively handle those conflicts with high accuracy.
Keywords
Web services; data handling; Wolrd Wide Web; approximate functional dependency driven rule generation scheme; data augmentation; databases; evaluation system; public domain; ripple down rules framework; rule-management; siloed datasets; single view creation; third-party providers; Approximation methods; Asynchronous transfer mode; Business; Data mining; Databases; Joining processes; Knowledge based systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Services Computing (SCC), 2011 IEEE International Conference on
Conference_Location
Washington, DC
Print_ISBN
978-1-4577-0863-3
Electronic_ISBN
978-0-7695-4462-5
Type
conf
DOI
10.1109/SCC.2011.14
Filename
6009242
Link To Document