• DocumentCode
    659499
  • Title

    Correlation-based performance analysis for full-system MapReduce optimization

  • Author

    Qi Guo ; Yan Li ; Tao Liu ; Kun Wang ; Guancheng Chen ; Xiaoming Bao ; Wentao Tang

  • Author_Institution
    IBM Res. - China, Beijing, China
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    753
  • Lastpage
    761
  • Abstract
    Big Data is changing this world at a surprising speed, and MapReduce plays a critical role in finding insights in Big Data. However, to efficiently extract insights from Big Data, performance optimization of MapReduce applications is a challenging task. To facilitate the full-system optimization of MapReduce applications, we propose a correlation-based performance analysis approach to efficiently identify critical outliers. The basic intuition is that critical outliers are key to the overall performance and they can only be accurately identified by correlating different phases, tasks and resources. Based on the proposed approach, we further implement a correlation-based performance analysis tool, called Sonata. It can efficiently identify critical outliers, and then, recommend optimization suggestions for practitioners based on embedded rules. Since the performance overhead is key to the applicability of a performance tool, we conduct experiments to demonstrate that Sonata is a practical tool with less than 5% overhead and good scalability. To demonstrate the effectiveness of Sonata, we share several cases during the performance tuning of IBM Platform SymphonyTM with the help of Sonata.
  • Keywords
    Big Data; optimisation; Big Data; IBM Platform Symphony; Sonata tool; correlation-based performance analysis tool; critical outliers identification; full-system MapReduce optimization; performance optimization; Correlation; Hardware; History; Optimization; Performance analysis; Runtime; Tuning; Big Data; MapReduce; Optimization; Performance Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691648
  • Filename
    6691648