Title :
Stream as You Go: The Case for Incremental Data Access and Processing in the Cloud
Author :
Kienzler, Romeo ; Bruggmann, Rémy ; Ranganathan, Anand ; Tatbul, Nesime
Author_Institution :
Dept. of Comput. Sci., ETH Zurich, Zurich, Switzerland
Abstract :
Cloud infrastructures promise to provide high-performance and cost-effective solutions to large-scale data processing problems. In this paper, we identify a common class of data-intensive applications for which data transfer latency for uploading data into the cloud in advance of its processing may hinder the linear scalability advantage of the cloud. For such applications, we propose a "stream-as-you-go" approach for incrementally accessing and processing data based on a stream data management architecture. We describe our approach in the context of a DNA sequence analysis use case and compare it against the state of the art in MapReduce-based DNA sequence analysis and incremental MapReduce frameworks. We provide experimental results over an implementation of our approach based on the IBM InfoSphere Streams computing platform deployed on Amazon EC2, showing an order of magnitude improvement in total processing time over the state of the art.
Keywords :
DNA; cloud computing; DNA sequence analysis; cloud infrastructures; cloud processing; data Processing; data intensive applications; data transfer latency; incremental data access; incremental mapreduce frameworks; mapreduce based DNA sequence analysis; stream data management architecture; streams computing; Bioinformatics; DNA; Genomics; Sequences; Sorting;
Conference_Titel :
Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on
Conference_Location :
Arlington, VA
Print_ISBN :
978-1-4673-1640-8
DOI :
10.1109/ICDEW.2012.69