• DocumentCode
    1783345
  • Title

    Using Multiple Threads to Accelerate Single Thread Performance

  • Author

    Sura, Zehra ; O´Brien, Kevin ; Brunheroto, Jose

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    985
  • Lastpage
    994
  • Abstract
    Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special "enque" and "deque" instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance.
  • Keywords
    multi-threading; parallelising compilers; program diagnostics; software performance evaluation; Blue Gene/Q system; IBM XL compiler framework; automatic fine-grained parallel code generation; code sections; compiler analysis; computing systems; deque instructions; enque instructions; execution model; fine-grained parallelism; hardware queues; low-latency value transfer; multithreading; parallel processing; sequential code region execution; sequential execution; single thread performance acceleration; time spent minimization; Acceleration; Benchmark testing; Hardware; Instruction sets; Parallel processing; Partitioning algorithms; Registers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.104
  • Filename
    6877328