Title :
Performance analysis of broadcasting algorithms on the Intel Single-Chip Cloud Computer
Author :
Matienzo, John ; Jerger, Natalie Enright
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
Abstract :
Efficient broadcasting is essential for good performance on distributed or multiprocessor systems. Broadcasts are commonly used to implement message passing synchronization primitives, such as barriers, and also appear frequently in the set up stage of scientific applications. The Intel Single-Chip Cloud Computer (SCC), an experimental processor, uses synchronous message passing to facilitate communication between its 48 cores. RCCE, the SCC´s message passing library, implements broadcasting in a traditional way: sending n-1 unicast messages, where n is the number of cores participating in the broadcast. This implementation can hinder performance as the number of cores participating in the broadcast increases and if the data being sent to each core is large. Also in the RCCE implementation, the broadcasting core is blocked from doing any useful work until all cores receive the broadcast. This paper explores several broadcasting schemes that take advantage of the resources of the SCC and the RCCE library. For example, we explore a scheme that propagates a broadcast to multiple cores in parallel and a scheme that parallelizes off-chip memory accesses which would otherwise need to be done sequentially. Our best broadcast scheme achieves a 35× speedup over the RCCE implementation. We also demonstrate that our improved broadcasting substantially reduces the time spent on communication in some benchmarks. While the broadcast schemes presented in this paper are implemented specifically for the SCC, they provide insight into the more general problem of broadcast communication and could be adapted to other types of distributed and multiprocessor systems.
Keywords :
broadcasting; cloud computing; message passing; microprocessor chips; multiprocessing systems; performance evaluation; resource allocation; synchronisation; Intel single-chip cloud computer; RCCE implementation; RCCE library; SCC message passing library; broadcast communication; broadcasting algorithms; broadcasting core; distributed systems; experimental processor; message passing synchronization primitives; multiprocessor systems; off-chip memory accesses; performance analysis; synchronous message passing; unicast messages; Benchmark testing; Broadcasting; Computers; Libraries; Message passing; System-on-chip; Tiles;
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4673-5776-0
Electronic_ISBN :
978-1-4673-5778-4
DOI :
10.1109/ISPASS.2013.6557167