• DocumentCode
    3078298
  • Title

    Understanding Unsuccessful Executions in Big-Data Systems

  • Author

    Rosa, Andrea ; Chen, Lydia Y. ; Binder, Walter

  • Author_Institution
    Fac. of Inf. Lugano, Univ. della Svizzera italiana (USI), Lugano, Switzerland
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    741
  • Lastpage
    744
  • Abstract
    Big-data applications are being increasingly used in today´s large-scale data enters for a large variety of purposes, such as solving scientific problems, running enterprise services, and computing data-intensive tasks. Due to the growing scale of these systems and the complexity of running applications, jobs running in big-data systems experience unsuccessful terminations of different nature. While a large body of existing studies sheds light on failures occurred in large-scale data enters, the current literature overlooks the characteristics and the performance impairment of a broader class of unsuccessful executions which can arise due to application failures, dependency violations, machine constraints, job kills, and task pre-emption. Nonetheless, deepening our understanding in this field is of paramount importance, as unsuccessful executions can lower user satisfaction, impair reliability, and lead to a high resource waste. In this paper, we describe the problem of unsuccessful executions in big-data systems, and highlight the critical importance of improving our knowledge on this subject. We review the existing literature on this field, discuss its limitations, and present our own contributions to the problem, along with our research plan for the future.
  • Keywords
    Big Data; computer centres; Big-Data systems; data-intensive task computing; dependency violations; enterprise services; job kills; large-scale datacenters; machine constraints; task preemption; Analytical models; Correlation; Google; Hardware; Predictive models; Reliability; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.138
  • Filename
    7152546