Title :
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
Author :
Lima, Joao V. F. ; Broquedis, Francois ; Gautier, Thierry ; Raffin, Bruno
Author_Institution :
Grenoble Inst. of Technol., Grenoble, France
Abstract :
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy Bridge Xeon-based machine. Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular and dynamic tasks and a Cholesky factorization algorithm. We also compare the Cholesky factorization with the parallel algorithm provided by the Intel MKL library for Intel Xeon Phi. Performance evaluation shows our XKaapi data-flow parallel programming environment exposes the lowest overhead of all and is highly competitive with native OpenMP and CilkPlus environments on Xeon Phi. Moreover, the efficient handling of data-flow dependencies between tasks makes our XKaapi environment exhibit more parallelism for some applications such as the Cholesky factorization. In that case, we observe substantial gains with up to 180 hardware threads over the state of the art MKL, with a 47% performance increase for 60 hardware threads.
Keywords :
Fibonacci sequences; benchmark testing; coprocessors; data flow computing; matrix decomposition; message passing; multi-threading; parallel algorithms; Cholesky factorization algorithm; Fibonacci computation; Intel CilkPlus; Intel MKL library; Intel OpenMP; Intel Xeon Phi Coprocessor; Intel Xeon Phi accelerator; NQueens application; Sandy Bridge Xeon-based machine; XKaapi data-flow parallel programming environment; benchmark suite; computing kernels; data-flow dependency handling; dynamic tasks; hardware thread; irregular tasks; parallel algorithm; parallel applications; performance evaluation; runtime system overhead; runtime system scalability; Benchmark testing; Computer architecture; Coprocessors; Hardware; Instruction sets; Parallel programming; Runtime; Intel Xeon Phi; accelerators; data-flow programming; runtime systems; work stealing;
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2013 25th International Symposium on
Conference_Location :
Porto de Galinhas
Print_ISBN :
978-1-4799-2927-6
DOI :
10.1109/SBAC-PAD.2013.28