DocumentCode :
166612
Title :
Multiobjective optimization technique based on monitoring information to increase the performance of thread migration on multicores
Author :
Lorenzo, O.G. ; Pena, Tomas F. ; Cabaleiro, J.C. ; Pichel, Juan C. ; Rivera, F.F.
Author_Institution :
CITIUS Centro de Investig. en Tecnoloxias da Informacion, Univ. de Santiago de Compostela, Santiago de Compostela, Spain
fYear :
2014
fDate :
22-26 Sept. 2014
Firstpage :
416
Lastpage :
423
Abstract :
Multicore systems present on-board memory hierarchies and communication networks that influence their performance when they execute shared memory parallel codes. Characterizing this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance. In this paper, monitoring information extracted from hardware counters in runtime is used to characterize the behaviour of each thread in the parallel code in terms of three values: the number of floating point operations per second, the operational intensity, and the memory access latency. Note that these values characterize the Roofline Model with the inclusion of additional information about memory access latencies. We propose to use this information to guide thread migration strategies that improve the efficiency of the execution of the code by increasing locality and affinity. The idea behind this proposal is to use these three values as objective functions to be optimized as a multiobjective optimization problem. The proposed technique is an iterative method inspired in evolutive optimization algorithms. To this end, an individual utility function is defined to represent the relative importance of these values. This function is a weighted product that can be considered as representative of the performance of each parallel thread. Different configurations of the SAXPY and SDOT kernels on multicores were used to validate the benefits of the proposed thread migration strategies. The results show that our strategy produces improvements up to 25% in scenarios where locality and affinity are low, and negligible degradation is observed when they are high. The use of hardware counters produces low overheads when extracting monitoring information.
Keywords :
monitoring; optimisation; shared memory systems; telecommunication networks; SAXPY kernels; SDOT kernels; communication networks; monitoring information; multicore systems; multiobjective optimization technique; on-board memory hierarchies; roofline model; shared memory parallel codes; thread migration; Hardware; Instruction sets; Message systems; Monitoring; Multicore processing; Optimization; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2014 IEEE International Conference on
Conference_Location :
Madrid
Type :
conf
DOI :
10.1109/CLUSTER.2014.6968733
Filename :
6968733
Link To Document :
بازگشت