• DocumentCode
    3680228
  • Title

    Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

  • Author

    Ahsan Javed Awan;Mats Brorsson;Vladimir Vlassov;Eduard Ayguade

  • Author_Institution
    Software &
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.
  • Keywords
    "Benchmark testing","Sparks","Instruction sets","Scalability","Servers","Data analysis","Big data"
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on
  • Type

    conf

  • DOI
    10.1109/BDCloud.2015.37
  • Filename
    7310708