Title :
Data analytics workloads: Characterization and similarity analysis
Author :
Panda, Reena ; John, Lizy Kurian
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark suites like the SPEC CPU2006 benchmarks are widely used by researchers either due to ease of setup, or simulation time constraints etc. However, as the popular benchmarks such as SPEC CPU2006 benchmarks do not capture the characteristics of the wide variety of emerging real-world applications, using them as the basis for performance evaluation may lead to either suboptimal designs or misleading results. In this paper, we characterize the behavior of the data analytics workloads, an important class of emerging applications, and perform a systematic similarity analysis with the popular SPEC CPU2006 & SPECjbb2013 benchmarks suites. To characterize the workloads, we use hardware performance counter based measurements and a variety of extracted micro-architecture independent workload characteristics. Then, we use statistical data analysis techniques, namely principal component analysis and clustering techniques, to analyze the similarity/dissimilarity among these different classes of applications. In this paper, we demonstrate the inherent differences between the characteristics of the different classes of applications and how to arrive at meaningful subsets of benchmarks, which will help in faster and more accurate targeted early hardware system performance evaluation.
Keywords :
data analysis; pattern clustering; performance evaluation; principal component analysis; set theory; statistical analysis; SPEC CPU2006 benchmarks; benchmark suites; clustering techniques; computer systems; data analytic workloads; dissimilarity analysis; hardware performance counter based measurements; hardware system performance evaluation; microarchitecture independent workload characteristics; principal component analysis; processor design-space evaluation studies; simulation time constraints; suboptimal designs; systematic similarity analysis; Benchmark testing; Data analysis; Databases; Hardware; Measurement; Principal component analysis; Radiation detectors; Big Data; Clustering; Data Analytics; Micro-architectural Evaluation; Performance Evaluation; Principal Component Analysis; Workload Characterization;
Conference_Titel :
Performance Computing and Communications Conference (IPCCC), 2014 IEEE International
Conference_Location :
Austin, TX
DOI :
10.1109/PCCC.2014.7017065