DocumentCode :
3210146
Title :
A Systemic Strategy for Tuning Intra-node Collective Communication on Multicore Systems
Author :
Liu, Zhiqiang ; Song, Junqiang ; Ren, Kaijun ; Xu, Fen ; Qu, Xiaoling
Author_Institution :
Coll. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2009
fDate :
17-19 Dec. 2009
Firstpage :
14
Lastpage :
21
Abstract :
In HPC domain, a majority of applications build on MPI and employ collective operations in their communication kernels. Improving the performance of collectives has been long term focused by a lot of work. Recently, in the optimization work of collectives on multi-core clusters, hierarchical algorithm designs are remark-able. This kind of algorithms can greatly reduce the inter-node traffic but increase the intra-node traffic load at the same time. Meanwhile, in hierarchical collectives, the part of intra-node collectives take more and more time while the number of cores in each node keeps growing. Improving the performance of intra-node collectives is critical to the holistic performance. However, on multi-cores, the factor of process affinity greatly impacts the performance of an intra-node collective. This peculiarity challenges us how to improve the overall performance of intra-node collectives. Towards this problem, in this paper, we propose a novel and systemic strategy for tuning the performance of intra-node collectives. As illustrative examples, we have implemented our strategy on a dual-socket Intel Clovertown platform and successfully tuned the performance of Broadcast and Allgather up to 14% and 52% improvement together.
Keywords :
message passing; microprocessor chips; MPI; dual-socket Intel Clovertown platform; hierarchical algorithm designs; high performance computing; intranode collective communication tuning; intranode traffic load; multicore clusters; multicore systems; systemic strategy; Algorithm design and analysis; Application software; Broadcasting; Clustering algorithms; Computer science; Design optimization; Educational institutions; Kernel; Multicore processing; Telecommunication traffic; Allgather; Broadcast; Collective communication; MPI; Multicore;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontier of Computer Science and Technology, 2009. FCST '09. Fourth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3932-4
Electronic_ISBN :
978-1-4244-5467-9
Type :
conf
DOI :
10.1109/FCST.2009.101
Filename :
5392942
Link To Document :
بازگشت