Title :
Enabling an Enhanced Data-as-a-Service Ecosystem
Author :
Smit, Meint ; Shtern, Mark ; Simmons, Bradley ; Litoiu, Marin
Author_Institution :
York Univ., Toronto, ON, Canada
fDate :
June 28 2013-July 3 2013
Abstract :
The sharing of large and interesting Big Data in cloud environments can be achieved using data-as-a-service, where a provider offers data to interested users. In enhanced data-as-a-service, the data provider also supplies compute infrastructure, allowing users to run analytics tasks local to the data and reducing the (expensive and slow) transmission of data over networks. This paper describes a services-based ecosystem that allows providers to precisely share portions of their data with users, using a model where users submit MapReduce jobs that run on the provider´s Hadoop infrastructure. Providers are given mechanisms to filter, segment, and/or transform data before it reaches the user´s task. The ecosystem also allows for intermediaries who offer value-added filtrations, segmentations, or transformations of the data (for example, pre-filtering a dataset to only include high-income users). We describe the RESTful services required to enable this ecosystem, introduce a prototype to demonstrate the concept, and present experiments using this ecosystem to both provide and analyze different segments of a single large data set.
Keywords :
cloud computing; data analysis; parallel processing; Hadoop infrastructure; RESTful services; analytics tasks; big data; cloud environments; compute infrastructure; data filtering; data segmentation; data transform; data transmission; enhanced data-as-a-service ecosystem; value-added filtrations; Databases; Ecosystems; Runtime; Servers; Transforms; Twitter; Web services; Big Data; RESTful services; data-as-a-service; service system;
Conference_Titel :
Services (SERVICES), 2013 IEEE Ninth World Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5024-4
DOI :
10.1109/SERVICES.2013.53