MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Author

Mian Lu ; Yun Liang ; Huynh Phung Huynh ; Zhongliang Ong ; Bingsheng He ; Goh, Rick Siow Mong

Author_Institution

Inst. of High Performance Comput., A*STAR, Singapore, Singapore

Volume

26

Issue

11

fYear

2015

Firstpage

3066

Lastpage

3078

Abstract

In this work, we develop MrPhi, an optimized MapReduce framework on a heterogeneous computing platform, particularly equipped with multiple Intel Xeon Phi coprocessors. To the best of our knowledge, this is the first work to optimize the MapReduce framework on the Xeon Phi. We first focus on employing advanced features of the Xeon Phi to achieve high performance on a single coprocessor. We propose a vectorization friendly technique and SIMD hash computation algorithms to utilize the SIMD vectors. Then we pipeline the map and reduce phases to improve the resource utilization. Furthermore, we eliminate multiple local arrays but use low cost atomic operations on the global array to improve the thread scalability. For a given application, our framework is able to automatically detect suitable techniques to apply. Moreover, we extend our framework to a heterogeneous platform to utilize all hardware resource effectively. We adopt non-blocking data transfer to hide the communication overhead. We also adopt aligned memory transfer in order to fully utilize the PCIe bandwidth between the host and coprocessor. We conduct comprehensive experiments to benchmark the Xeon Phi and compare our optimized MapReduce framework with a state-of-the-art multi-core based MapReduce framework (Phoenix++). By evaluating six real-world applications, the experimental results show that our optimized framework is 1.2 to 38× faster than Phoenix++ for various applications on a single Xeon Phi. Additionally, the performance of four applications is able to achieve linear scalability on a platform equipped with up to four Xeon Phi coprocessors.

Keywords

coprocessors; data handling; multi-threading; optimisation; parallel processing; resource allocation; Intel Xeon Phi coprocessor; MapReduce framework optimization; MrPhi; SIMD hash computation algorithm; communication overhead; heterogeneous computing platform; nonblocking data transfer; resource utilization; thread scalability; vectorization friendly technique; Arrays; Containers; Coprocessors; Hardware; Instruction sets; Vectors; Intel Many Integrated Core architecture (MIC); MapReduce; Xeon Phi; coprocessors; heterogeneous computing; high performance computing; parallel programming;

fLanguage

English

Journal_Title

Parallel and Distributed Systems, IEEE Transactions on

Publisher

ieee

ISSN

1045-9219

Type

jour

DOI

10.1109/TPDS.2014.2365784

Filename

6939728