Progression of MPI Non-blocking Collective Operations Using Hyper-Threading

Author

Miwa, Masahiro ; Nakashima, Kohta

Author_Institution

Fujitsu Labs. Ltd., Kawasaki, Japan

fYear

2015

fDate

4-6 March 2015

Firstpage

163

Lastpage

171

Abstract

MPI non-blocking collective operations offer a high level interface to MPI library users, and potentially allow communication to be overlapped with calculation. Progression, which controls communications running in the background of the calculation, is the key factor to achieve an efficient overlap. The most commonly used progression method is manual progression, in which a progression function is called in the main calculation. In manual progression, MPI library users have to estimate the communication timing to maximize the overlap effect and thus to manage the complex communication optimization. An alternative approach for progression is the use of separate communication threads. By using communication threads, communication calculation overlap can be achieved simply. However, context switches between the calculation thread and the communication thread cause lower performance in the frequent case where all cores are used for calculation. In this paper, we propose a novel threaded progression method using Hyper-Threading to maximize the overlap effect of non-blocking collective operations. We apply MONITOR/MWAIT instructions to the communication thread on Hyper-Threading so as not to degrade the calculation thread due to shared core resource conflict. Evaluation on 8-node Infini Band connected IA server clustered systems confirmed that the latency is suppressed to a small level and that our approach has an advantage over manual progression in terms of communication-calculation overlap. Using a real application of CG benchmark, our method achieved 32% reduction in execution time compared to using blocking collective operation, and that is nearly perfect overlap. Although manual progression also achieved perfect overlap, our method has the advantage that no communication timing tuning is required for each application.

Keywords

application program interfaces; message passing; multi-threading; IA server clustered systems; MONITOR/MWAIT instructions; MPI library users; MPI nonblocking collective operation progression; blocking collective operation; communication calculation; communication calculation overlap; communication threads; complex communication optimization; context switches; hyper threading; manual progression; nonblocking collective operations; novel threaded progression method; overlap effect; progression function; Benchmark testing; Degradation; Libraries; Manuals; Message systems; Monitoring; Protocols;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on

Conference_Location

Turku

ISSN

1066-6192

Type

conf

DOI

10.1109/PDP.2015.68

Filename

7092715