Title :
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture
Author :
Vujic, Nikola ; Gonzàlez, Marc ; Martorell, Xavier ; Ayguadé, Eduard
Author_Institution :
Barcelona Supercomput. Center (BSC), Barcelona, Spain
fDate :
4/1/2010 12:00:00 AM
Abstract :
Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques. Memory accesses are classified at compile time into two classes: high locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software-cache overhead in the innermost loop. The cache design enables automatic prefetch and modulo scheduling transformations. Performance evaluation indicates that optimized software-cache structures combined with the proposed prefetch techniques translate into speedup between 10 and 20 percent. As a result of the proposed technique, we can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.
Keywords :
cache storage; memory architecture; microprocessor chips; multiprocessing systems; performance evaluation; processor scheduling; Cell BE architecture; Cell BE processor; IBM PowerPC 970MP processor; access pattern; automatic prefetch transformation; cache design; cache references; hardware support; high-level compiler optimizations; hybrid software-cache architecture; memory accesses; memory architecture; memory references; modulo scheduling transformations; multicore systems; parallel NAS applications; performance evaluation; prefetch techniques; server-class multicore; software cache; software-cache overhead; software-cache structures; transparent data transfer; Multicore processor; local memories; prefetch code generation.; software cache;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2009.97