• DocumentCode
    3497150
  • Title

    Breaking the boundary for whole-system performance optimization of big data

  • Author

    Yan Li ; Kun Wang ; Qi Guo ; Xin Li ; Xiaochen Zhang ; Guancheng Chen ; Tao Liu ; Jian Li

  • Author_Institution
    IBM Res. - China, China
  • fYear
    2013
  • fDate
    4-6 Sept. 2013
  • Firstpage
    126
  • Lastpage
    131
  • Abstract
    MapReduce plays an critical role in finding insights in Big Data. The performance optimization of MapReduce programs is challenging because it requires a comprehensive understanding of the whole system including both hardware layers (processors, storages, networks and etc), and software stacks (operating systems, JVM, runtime, applications and etc). However, most of the existing performance tuning and optimization are based on empirical and heuristic attempts. It remains a blank on how to build a systematical framework which breaks the boundary of multiple layers for performance optimization. In this paper, we propose a performance evaluation framework by correlating performance metrics from different layers, which provides insights to efficiently pinpoint the performance issue. This framework is composed of a series of predefined patterns. Each pattern indicates one or more potential issues. The behavior of a MapReduce program is mapped to the corresponding resource utilization. The framework provides a holistic approach which allows users at different levels of experience to conduct MapReduce program performance optimization. We use Terasort benchmark running on a 10-node Power7R2 cluster as a real case to show how this framework improves the performance. By this framework, we finally get the Terasort result improved from 47 mins to less than 8 mins. In addition to the best practice on performance tuning, several key findings are summarized as valuable workload analysis for JVM, MapReduce runtime and application design.
  • Keywords
    data handling; optimisation; parallel programming; performance evaluation; resource allocation; 10-node Power7R2 cluster; JVM; MapReduce program performance optimization; Terasort benchmark; big data; hardware layers; performance evaluation framework; performance tuning; resource utilization; software stacks; whole-system performance optimization; Hardware; Indexes; Java; Optimization; Runtime; Software; Tuning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-1234-6
  • Type

    conf

  • DOI
    10.1109/ISLPED.2013.6629278
  • Filename
    6629278