• DocumentCode
    704133
  • Title

    Progression of MPI Non-blocking Collective Operations Using Hyper-Threading

  • Author

    Miwa, Masahiro ; Nakashima, Kohta

  • Author_Institution
    Fujitsu Labs. Ltd., Kawasaki, Japan
  • fYear
    2015
  • fDate
    4-6 March 2015
  • Firstpage
    163
  • Lastpage
    171
  • Abstract
    MPI non-blocking collective operations offer a high level interface to MPI library users, and potentially allow communication to be overlapped with calculation. Progression, which controls communications running in the background of the calculation, is the key factor to achieve an efficient overlap. The most commonly used progression method is manual progression, in which a progression function is called in the main calculation. In manual progression, MPI library users have to estimate the communication timing to maximize the overlap effect and thus to manage the complex communication optimization. An alternative approach for progression is the use of separate communication threads. By using communication threads, communication calculation overlap can be achieved simply. However, context switches between the calculation thread and the communication thread cause lower performance in the frequent case where all cores are used for calculation. In this paper, we propose a novel threaded progression method using Hyper-Threading to maximize the overlap effect of non-blocking collective operations. We apply MONITOR/MWAIT instructions to the communication thread on Hyper-Threading so as not to degrade the calculation thread due to shared core resource conflict. Evaluation on 8-node Infini Band connected IA server clustered systems confirmed that the latency is suppressed to a small level and that our approach has an advantage over manual progression in terms of communication-calculation overlap. Using a real application of CG benchmark, our method achieved 32% reduction in execution time compared to using blocking collective operation, and that is nearly perfect overlap. Although manual progression also achieved perfect overlap, our method has the advantage that no communication timing tuning is required for each application.
  • Keywords
    application program interfaces; message passing; multi-threading; IA server clustered systems; MONITOR/MWAIT instructions; MPI library users; MPI nonblocking collective operation progression; blocking collective operation; communication calculation; communication calculation overlap; communication threads; complex communication optimization; context switches; hyper threading; manual progression; nonblocking collective operations; novel threaded progression method; overlap effect; progression function; Benchmark testing; Degradation; Libraries; Manuals; Message systems; Monitoring; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on
  • Conference_Location
    Turku
  • ISSN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2015.68
  • Filename
    7092715