Title :
Reducing Communication Overhead in the High Performance Conjugate Gradient Benchmark on Tianhe-2
Author :
Fangfang Liu ; Chao Yang ; Yiqun Liu ; Xianyi Zhang ; Yutong Lu
Author_Institution :
Inst. of Software, Beijing, China
Abstract :
The High Performance Conjugate Gradient (HPCG) benchmark, proposed recently in 2013, has drawn increasingly more attention from both academia and industry. Unlike the High Performance Linpack (HPL) benchmark, which has a very high computation-to-communication ratio, HPCG contains both neigh boring and global communication that may severely degrade the parallel performance. To reduce the communication overhead of neigh boring communications, we overlap halo updates with halo-independent computations. To hide the cost of the global reductions in vector dot-products, we make use of two reformulated CG algorithms, namely the Gropp´s asynchronous CG and the pipelined CG. Some further optimizations are done to decrease the extra overhead introduced in the reformulated CG algorithms. We show by experiments on the world´s largest heterogeneous system - Tianhe-2 that the optimized HPCG code scales to 256 nodes (49,920 cores) with a nearly ideal weak scalability of over 90% and an aggregate performance of 10.51Tflops.
Keywords :
conjugate gradient methods; parallel processing; Gropp asynchronous CG algorithm; HPCG benchmark; HPL benchmark; Tianhe-2 system; communication overhead reduction; halo updates; halo-independent computation; high performance Linpack benchmark; high performance conjugate gradient benchmark; parallel performance; pipelined CG algorithm; Benchmark testing; Global communication; Kernel; Optimization; Sparse matrices; Standards; Vectors; HPCG; Tianhe-2; asynchronous CG; communication-computation overlap; pipelined CG;
Conference_Titel :
Distributed Computing and Applications to Business, Engineering and Science (DCABES), 2014 13th International Symposium on
Conference_Location :
Xian Ning
Print_ISBN :
978-1-4799-4170-4
DOI :
10.1109/DCABES.2014.6