DocumentCode :
2453872
Title :
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
Author :
Chu, Michael ; Ravindra, Rajiv ; Mahlke, Scott
Author_Institution :
Univ. of Michigan, Ann Arbor
fYear :
2007
fDate :
1-5 Dec. 2007
Firstpage :
369
Lastpage :
380
Abstract :
The recent design shift towards multicore processors has spawned a significant amount of research in the area of program parallelization. The future abundance of cores on a single chip requires programmer and compiler intervention to increase the amount of parallel work possible. Much of the recent work has fallen into the areas of coarse-grain parallelization: new programming models and different ways to exploit threads and data-level parallelism. This work focuses on a complementary direction, improving performance through automated fine-grain parallelization. The main difficulty in achieving a performance benefit from fine-grain parallelism is the distribution of data memory accesses across the data caches of each core. Poor choices in the placement of data accesses can lead to increased memory stalls and low resource utilization. We propose a profile-guided method for partitioning memory accesses across distributed data caches. First, a profile determines affinity relationships between memory accesses and working set characteristics of individual memory operations in the program. Next, a program-level partitioning of the memory operations is performed to divide the memory accesses across the data caches. As a result, the data accesses are proactively dispersed to reduce memory stalls and improve computation parallelization. A final detailed partitioning of the computation instructions is performed with knowledge of the cache location of their associated data. Overall, our data partitioning reduces stall cycles by up to 51 % versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.
Keywords :
cache storage; graph theory; multiprocessing systems; parallel processing; affinity relationship; cache location; data access graph; data access partitioning; data-level parallelism; distributed data cache; fine-grain parallelization; memory access partitioning; memory stall; multicore architecture; multicore processors; profile-guided method; program parallelization; resource utilization; Computer architecture; Laboratories; Microarchitecture; Multicore processing; Multiprocessor interconnection networks; Parallel processing; Parallel programming; Program processors; Programming profession; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
ISSN :
1072-4451
Print_ISBN :
978-0-7695-3047-5
Electronic_ISBN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2007.15
Filename :
4408269
Link To Document :
بازگشت