DocumentCode :
3664306
Title :
Auto-tuning Non-blocking Collective Communication Operations
Author :
Youcef Barigou;Vishwanath Venkatesan;Edgar Gabriel
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
fYear :
2015
fDate :
5/1/2015 12:00:00 AM
Firstpage :
1204
Lastpage :
1213
Abstract :
Collective operations are widely used in large scale scientific applications, and critical to the scalability of these applications for large process counts. It has also been demonstrated that collective operations have to be carefully tuned for a given platform and application scenario to maximize their performance. Non-blocking collective operations extend the concept of collective operations by offering the additional benefit of being able to overlap communication and computation. This paper presents the automatic run-time tuning of non-blocking collective communication operations, which allows the communication library to choose the best performing implementation for a non-blocking collective operation on a case by case basis. The paper demonstrates that libraries using a single algorithm or implementation for a non-blocking collective operation will inevitably lead to suboptimal performance in many scenarios, and thus validate the necessity for run-time tuning of these operations. The benefits of the approach are further demonstrated for an application kernel using a multi-dimensional Fast Fourier Transform. The results obtained for the application scenario indicate a performance improvement of up to 40% compared to the current state of the art.
Keywords :
"Libraries","Tuning","Runtime","Schedules","Benchmark testing","Time measurement","Fast Fourier transforms"
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
Type :
conf
DOI :
10.1109/IPDPSW.2015.15
Filename :
7284450
Link To Document :
بازگشت