Author_Institution :
Sch. of Comput. Sci. & Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
We conducted a micro benchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the "energy roofline" model, developed in prior work, which we extend in two ways. First, we improve the model\´s accuracy by accounting for power caps, basic memory hierarchy access costs, and measurement of random memory access patterns. Secondly, we empirically evaluate server-, mini-, and mobile-class platforms that span a range of compute and power characteristics. Our study includes a dozen such platforms, including x86 (both conventional and Xeon Phi), ARM, GPU, and hybrid (AMD APU and other SoC) processors. These data and our model analytically characterize the range of algorithmic regimes where we might prefer one building block to others. It suggests critical values of arithmetic intensity around which some systems may switch from being more to less time- and energy-efficient than others, it further suggests how, with respect to intensity, operations should be throttled to meet a power cap. We hope our methods can help make debates about the relative merits of these and other systems more quantitative, analytical, and insightful.
Keywords :
parallel processing; power aware computing; AMD-APU processor; ARM processor; GPU processor; HPC compute-node building blocks; SoC processor; Xeon Phi x86 processor; algorithmic time; analytical analysis; arithmetic intensity; basic-memory hierarchy access costs; compute characteristics; conventional x86 processor; critical values; empirical evaluation; energy roofline model; energy-efficiency; high-performance computing systems; hybrid processor; memory access pattern measurement; microbenchmarking study; miniclass platform; mobile-class platform; model accuracy improvement; power caps; power characteristics; quantitative analysis; server-class platform; time-efficiency; Abstracts; Algorithm design and analysis; Computational modeling; Graphics processing units; Mobile communication; Power measurement; algorithms; energy; performance modeling; power; system balance;