Title :
V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows
Author :
Ghit, Bogdan ; Capota, M. ; Hegeman, Tim ; Hidders, Jan ; Epema, Dick ; Iosup, Alexandru
Author_Institution :
Parallel & Distrib. Syst. Group, Delft Univ. of Technol., Delft, Netherlands
Abstract :
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results.
Keywords :
Big Data; data analysis; data reduction; optimisation; BTWorld scaling; BitTorrent data processing; BitTorrent network; MapReduce; complex Big Data workflow scaling; optimization cycle design; software stack; Big data; Data mining; Monitoring; Optimization; Peer-to-peer computing; Runtime; Software;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
DOI :
10.1109/CCGrid.2014.97