DocumentCode :
185615
Title :
Norming to Performing: Failure Analysis and Deployment Automation of Big Data Software Developed by Highly Iterative Models
Author :
Keun Soo Yim
fYear :
2014
fDate :
3-6 Nov. 2014
Firstpage :
144
Lastpage :
155
Abstract :
We observe many interesting failure characteristics from Big Data software developed and released using some kinds of highly iterative development models (e.g., Agile). ~16% of failures occur due to faults in software deployments (e.g., Packaging and pushing to production). Our analysis shows that many such production outages are at least partially due to some human errors rooted in the high frequency and complexity of software deployments. ~51% of the observed human errors (e.g., Tran-Scription, education, and communication error types) are avoidable through automation. We thus develop a fault-tolerant automation framework to make it efficient to automate end-to-end software deployment procedures. We apply the framework to two Big Data products. Our case studies show the complexity of the deployment procedures of multi-homed Big Data applications and help us to study the effectiveness of the validation and verification techniques for user-provided automation programs. We analyze the production failures of the two products again after the automation. Our experimental data shows how the automation and the associated procedure improvements reduce the deployment faults and overall failure rate, and improve the feature launch velocity. Automation facilitates more formal, procedure-driven software engineering practices which not only reduce the manual work and human-oriented, avoidable production outages but also help engineers to better understand overall software engineering procedures, making them more auditable, predictable, reliable, and efficient. We discuss two novel metrics to evaluate progress in mitigating human errors and the conditions indicating points to start such transition from owner-driven deployment practice.
Keywords :
Big Data; program verification; software fault tolerance; Big Data products; Big Data software; deployment automation; end-to-end software deployment procedures; failure analysis; failure characteristics; failure rate; fault-tolerant automation framework; iterative models; multihomed Big Data applications; procedure-driven software engineering practices; production failures; production outages; software deployment faults; user-provided automation programs; validation techniques; verification techniques; Automation; Big data; Fault tolerance; Fault tolerant systems; Production; Runtime; Software; Automation; failure classification; human error; incremental validation; iterative development; software deployment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering (ISSRE), 2014 IEEE 25th International Symposium on
Conference_Location :
Naples
ISSN :
1071-9458
Print_ISBN :
978-1-4799-6032-3
Type :
conf
DOI :
10.1109/ISSRE.2014.31
Filename :
6982622
Link To Document :
بازگشت