Title :
Design and Implementation of Portable and Efficient Non-blocking Collective Communication
Author :
Nomura, Akihiro ; Ishikawa, Yutaka ; Maruyama, Naoya ; Matsuoka, Satoshi
Author_Institution :
Global Schientific Inf. & Comput. Center, Tokyo Inst. of Technol., Tokyo, Japan
Abstract :
Non-blocking communications are widely used in parallel applications for hiding communication overheads through overlapped computation and communication. While most of the existing implementations provide a non-blocking version of point-to-point communications, there is no portable and efficient implementation of non-blocking collectives, partly because application execution contexts need to be interrupted by dependent communications. This paper presents a portable and efficient user-level implementation technique of non-blocking communications. It allows users to design non-blocking collectives by declaring their operations and dependencies using provided APIs without being concerned with complicated management of their progression. While user-level implementations can be less efficient than kernel-level ones due to the cost of OS context switches, we solve this problem by employing the Marcel user level light-weight thread library when invoking communication operations. More specifically, each communication operation is mapped to one Marcel thread and scheduled to be executed when each operation´s dependencies are satisfied by certain events. All executable operations and main user thread are executed simultaneously without any explicit invocations. Performance evaluations with micro benchmarks demonstrate the effectiveness of our proposed technique. Compared to existing OS-thread based method, it reduces CPU load to less than 10% while achieving similar level of communication latencies. We also discuss and compare the descriptive power of internal expressions for non-blocking communications.
Keywords :
application program interfaces; operating systems (computers); parallel processing; API; Marcel user level light-weight thread library; OS context switch; application execution contexts; communication overhead hiding; dependent communications; nonblocking collective communication; overlapped communication; overlapped computation; parallel applications; performance evaluations; point-to-point communications; user-level implementation technique; Algorithm design and analysis; Context; Instruction sets; Kernel; Libraries; Message systems; Timing;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
Conference_Location :
Ottawa, ON
Print_ISBN :
978-1-4673-1395-7
DOI :
10.1109/CCGrid.2012.96