• DocumentCode
    1914322
  • Title

    WaxElephant: A Realistic Hadoop Simulator for Parameters Tuning and Scalability Analysis

  • Author

    Ren, Zujie ; Liu, Zhijun ; Xu, Xianghua ; Wan, Jian ; Shi, Weisong ; Zhou, Min

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Hangzhou Dianzi Univ., Hangzhou, China
  • fYear
    2012
  • fDate
    20-23 Sept. 2012
  • Firstpage
    9
  • Lastpage
    16
  • Abstract
    MapReduce is becoming the state-of-the-art computation paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. Hadoop, an open-source implementation of MapReduce framework, has gained much popularity due to its high scalability and performance. Two challenging issues for a large-scale Hadoop cluster are how to analyze the scalability and identify the optimal parameters configurations. To address these issues, we designed and implemented a Hadoop simulator called Wax Elephant, which provides the following capabilities: (1) loading real MapReduce workloads derived from the historical log of Hadoop clusters, and replaying the job execution history, (2) synthesizing workloads and executing them based on statistical characteristics of workloads, (3) identifying the optimal parameters configurations, and (4) analyzing the scalability of the cluster. Extensive experiments have been conducted to validate the accuracy of the Wax Elephant simulator.
  • Keywords
    digital simulation; distributed processing; pattern clustering; public domain software; statistical analysis; MapReduce framework; Wax Elephant simulator; cluster scalability analysis; job execution history replaying; large-scale Hadoop cluster; large-scale dataset processing; open-source implementation; optimal parameter configuration identification; parameter tuning; real MapReduce workload loading; realistic Hadoop simulator; scalability analysis; workload statistical characteristics; workload synthesis; Computer architecture; Distribution functions; Educational institutions; Generators; Production; Scalability; Tuning; Hadoop simulator; MapReduce; Parameters tuning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2623-0
  • Electronic_ISBN
    978-0-7695-4816-6
  • Type

    conf

  • DOI
    10.1109/ChinaGrid.2012.25
  • Filename
    6337309