DocumentCode
1783345
Title
Using Multiple Threads to Accelerate Single Thread Performance
Author
Sura, Zehra ; O´Brien, Kevin ; Brunheroto, Jose
Author_Institution
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
985
Lastpage
994
Abstract
Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special "enque" and "deque" instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance.
Keywords
multi-threading; parallelising compilers; program diagnostics; software performance evaluation; Blue Gene/Q system; IBM XL compiler framework; automatic fine-grained parallel code generation; code sections; compiler analysis; computing systems; deque instructions; enque instructions; execution model; fine-grained parallelism; hardware queues; low-latency value transfer; multithreading; parallel processing; sequential code region execution; sequential execution; single thread performance acceleration; time spent minimization; Acceleration; Benchmark testing; Hardware; Instruction sets; Parallel processing; Partitioning algorithms; Registers;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.104
Filename
6877328
Link To Document