Title :
Compiler-Assisted Data Distribution and Network Configuration for Chip Multiprocessors
Author :
Yong Li ; Abousamra, A. ; Melhem, Rami ; Jones, Alex K.
Author_Institution :
Comput. Eng. Program, Univ. of Pittsburgh, Pittsburgh, PA, USA
Abstract :
Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in nonuniform cache architectures with distributed cache banks. To mitigate this effect, we use a compiler-based approach to leverage data access locality, choose an optimized data placement and efficiently configure the on-chip network. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multithreaded memory access patterns (MMAPs). At runtime, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. Based on the partition, the communication pattern of the application can be extracted. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted data partitioning approach shows a 20 percent speedup over shared caching and 5 percent speedup over the closest runtime approximation, first touch. By leveraging the communication pattern we can achieve a comparable performance to a system that uses a complex centralized network configuration system at runtime. Thus, our final system saves significant runtime complexity and achieves an 5.1 percent additional speedup through the addition of the reconfigurable network.
Keywords :
cache storage; microprocessor chips; multi-threading; multiprocessing systems; network-on-chip; program compilers; reconfigurable architectures; sensor fusion; chip multiprocessors; compiler assisted data partitioning; compiler-assisted data distribution; compiler-based approach; complex centralized network configuration system; data access latency; data access locality; data association; data ownership; distributed cache banks; memory block allocation; multithreaded memory access patterns; nonuniform cache architectures; on-chip network; optimized data placement; partitioning algorithm; runtime approximation; runtime complexity; symbolic MMAP; Arrays; Benchmark testing; Instruction sets; Runtime; Circuit switching; communication; data access pattern; data partition; network-on-chip;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2011.279