DocumentCode :
635205
Title :
Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds
Author :
Weiyi Shang ; Zhen Ming Jiang ; Hemmati, Hadi ; Adams, Bram ; Hassan, Ahmed E. ; Martin, Patrick
Author_Institution :
Software Anal. & Intell. Lab. (SAIL), Queen´s Univ., Kingston, ON, Canada
fYear :
2013
fDate :
18-26 May 2013
Firstpage :
402
Lastpage :
411
Abstract :
Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.
Keywords :
cloud computing; data analysis; formal verification; parallel processing; program debugging; public domain software; software fault tolerance; system monitoring; Hadoop clouds; Hadoop-based BDA Apps; big data analysis; big data analytics applications; cloud deployments; deployment verification effort reduction; developer assistance; execution log abstraction; execution sequence recovery; parallel processing frameworks; software applications; Context; Data handling; Data storage systems; Information management; Joining processes; Keyword search; Programming; Big-Data Analytics Application; Cloud Computing; Hadoop; Log Analysis; Monitoring and Debugging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering (ICSE), 2013 35th International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4673-3073-2
Type :
conf
DOI :
10.1109/ICSE.2013.6606586
Filename :
6606586
Link To Document :
بازگشت