DocumentCode :
2217667
Title :
Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor
Author :
Ishizaki, Kazuaki ; Nakatani, Toshio ; Daijavad, Shahrokh
fYear :
2009
fDate :
4-6 Oct. 2009
Firstpage :
217
Lastpage :
226
Abstract :
A chip multiprocessor (CMP) with many low performance cores can achieve high performance or high performance/power for commercial server applications. The large number of hardware threads of a CMP with many low performance cores poses significant challenges to application developers in writing scalable applications. Many papers have assessed the architectural characteristics and the performance scalability, and some of them have identified lock contention as one of the scalability bottlenecks. However, there are few studies that resolved these problems, analyzed their causes, and compared the architectural characteristics before and after the scalability limitations were addressed. We analyzed and resolved some of the problems limiting the scalability of three commercial server applications with 64 hardware threads. We also did before and after comparisons of the architectural characteristics affected by the scalability enhancements, supporting the development of new processors. We addressed the lock contention with changes in the Java code. Our enhancements improved the performance scalability by up to 132%. We show that though the causes of lock contention are in different software layers, they share certain similarities and can be organized in three categories. Our comparisons reveal that the CPI and data TLB miss rates decrease, but the L2 data cache miss rates, L2 instruction cache miss rates, and memory traffic increase. These results suggest that we need to address the performance scalability problems of an application before we can accurately measure the architectural characteristics of a CMP.
Keywords :
Java; cache storage; multiprocessing systems; network servers; CPI; Java code; L2 data cache miss rates; L2 instruction cache miss rates; architectural characteristics; chip multiprocessor; commercial server workloads; data TLB miss rates; lock contention; memory traffic; performance scalability; scalability enhancements; Application software; CMOS technology; Frequency; Hardware; Parallel processing; Performance analysis; Pipelines; Process design; Scalability; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-5156-2
Electronic_ISBN :
978-1-4244-5157-2
Type :
conf
DOI :
10.1109/IISWC.2009.5306781
Filename :
5306781
Link To Document :
بازگشت