مرکز منطقه ای اطلاع رساني علوم و فناوري - Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

DocumentCode :

2453872

Title :

Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Author :

Chu, Michael ; Ravindra, Rajiv ; Mahlke, Scott

Author_Institution :

Univ. of Michigan, Ann Arbor

fYear :

2007

fDate :

1-5 Dec. 2007

Firstpage :

369

Lastpage :

380

Abstract :

The recent design shift towards multicore processors has spawned a significant amount of research in the area of program parallelization. The future abundance of cores on a single chip requires programmer and compiler intervention to increase the amount of parallel work possible. Much of the recent work has fallen into the areas of coarse-grain parallelization: new programming models and different ways to exploit threads and data-level parallelism. This work focuses on a complementary direction, improving performance through automated fine-grain parallelization. The main difficulty in achieving a performance benefit from fine-grain parallelism is the distribution of data memory accesses across the data caches of each core. Poor choices in the placement of data accesses can lead to increased memory stalls and low resource utilization. We propose a profile-guided method for partitioning memory accesses across distributed data caches. First, a profile determines affinity relationships between memory accesses and working set characteristics of individual memory operations in the program. Next, a program-level partitioning of the memory operations is performed to divide the memory accesses across the data caches. As a result, the data accesses are proactively dispersed to reduce memory stalls and improve computation parallelization. A final detailed partitioning of the computation instructions is performed with knowledge of the cache location of their associated data. Overall, our data partitioning reduces stall cycles by up to 51 % versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.

Keywords :

cache storage; graph theory; multiprocessing systems; parallel processing; affinity relationship; cache location; data access graph; data access partitioning; data-level parallelism; distributed data cache; fine-grain parallelization; memory access partitioning; memory stall; multicore architecture; multicore processors; profile-guided method; program parallelization; resource utilization; Computer architecture; Laboratories; Microarchitecture; Multicore processing; Multiprocessor interconnection networks; Parallel processing; Parallel programming; Program processors; Programming profession; Yarn;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on

Conference_Location :

Chicago, IL

ISSN :

1072-4451

Print_ISBN :

978-0-7695-3047-5

Electronic_ISBN :

1072-4451

Type :

conf

DOI :

10.1109/MICRO.2007.15

Filename :

4408269

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2453872