• DocumentCode
    186345
  • Title

    Understanding the behavior of in-memory computing workloads

  • Author

    Tao Jiang ; Qianlong Zhang ; Rui Hou ; Lin Chai ; Mckee, Sally A. ; Zhen Jia ; Sun, Ninghui

  • Author_Institution
    SKL Comput. Archit., ICT, Beijing, China
  • fYear
    2014
  • fDate
    26-28 Oct. 2014
  • Firstpage
    22
  • Lastpage
    30
  • Abstract
    The increasing demands of big data applications have led researchers and practitioners to turn to in-memory computing to speed processing. For instance, the Apache Spark framework stores intermediate results in memory to deliver good performance on iterative machine learning and interactive data analysis tasks. To the best of our knowledge, though, little work has been done to understand Spark´s architectural and microarchitectural behaviors. Furthermore, although conventional commodity processors have been well optimized for traditional desktops and HPC, their effectiveness for Spark workloads remains to be studied. To shed some light on the effectiveness of conventional generalpurpose processors on Spark workloads, we study their behavior in comparison to those of Hadoop, CloudSuite, SPEC CPU2006, TPC-C, and DesktopCloud. We evaluate the benchmarks on a 17-node Xeon cluster. Our performance results reveal that Spark workloads have significantly different characteristics from Hadoop and traditional HPC benchmarks. At the system level, Spark workloads have good memory bandwidth utilization (up to 50%), stable memory accesses, and high disk IO request frequency (200 per second). At the microarchitectural level, the cache and TLB are effective for Spark workloads, but the L2 cache miss rate is high. We hope this work yields insights for chip and datacenter system designers.
  • Keywords
    Big Data; parallel processing; storage management; 17-node Xeon cluster; Apache Spark framework; HPC benchmark; Hadoop; big data application; in-memory computing workload; interactive data analysis; iterative machine learning; memory access; memory bandwidth utilization; microarchitectural behavior; Bandwidth; Benchmark testing; Big data; Hardware; Microarchitecture; Program processors; Sparks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Workload Characterization (IISWC), 2014 IEEE International Symposium on
  • Conference_Location
    Raleigh, NC
  • Print_ISBN
    978-1-4799-6452-9
  • Type

    conf

  • DOI
    10.1109/IISWC.2014.6983036
  • Filename
    6983036