Title :
Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads
Author :
Juan M. Cebri´n;Juan L. Aragón;Stefanos Kaxiras
Author_Institution :
Dept. of Comput. Eng., Univ. of Murcia, Murcia, Spain
fDate :
5/1/2011 12:00:00 AM
Abstract :
In the recent years virtually all processor architectures employ multiple cores per chip (CMPs). It is possible to use legacy (i.e., single-core) power saving techniques in CMPs which run either sequential applications or independent multithreaded workloads. However, new challenges arise when running parallel shared-memory applications. In the later case, sacrificing some performance in a single core (thread) in order to be more energy-efficient might unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), therefore, harming the performance of the whole application. CMPs increasingly face thermal and power-related problems during their typical use. Such problems can be solved by setting a power budget to the processor/core. This paper initially studies the behavior of different techniques to match a predefined power budget in a CMP processor. While legacy techniques properly work for thread independent/multi-programmed workloads, parallel workloads exhibit the problem of independently adapting the power of each core in a thread dependent scenario. In order to solve this problem we propose a novel mechanism, Power Token Balancing (PTB), aimed at accurately matching an external power constraint by balancing the power consumed among the different cores using a power token-based approach while optimizing the energy efficiency. We can use power (seen as tokens or coupons) from non-critical threads for the benefit of critical threads. PTB runs transparent for thread independent / multiprogrammed workloads and can be also used as a spin lock detector based on power patterns. Results show that PTB matches more accurately a predefined power budget (total energy consumed over the budget is reduced to 8\% for a 16-core CMP) than DVFS with only a 3\% energy increase. Finally, we can trade accuracy on matching the power budget for energy-efficiency reducing the energy a 4% with a 20% of accuracy.
Keywords :
"Power demand","Synchronization","Benchmark testing","Spinning","Accuracy","Microarchitecture","Pipelines"
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
Print_ISBN :
978-1-61284-372-8
DOI :
10.1109/IPDPS.2011.49