DocumentCode :
3145010
Title :
Address Translation Optimization for Unified Parallel C Multi-dimensional Arrays
Author :
Serres, Olivier ; Anbar, Ahmad ; Merchant, Saumil G. ; Kayi, Abdullah ; El-Ghazawi, Tarek
Author_Institution :
Dept. of Electr. & Comput. Eng., George Washington Univ., Washington, DC, USA
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1191
Lastpage :
1198
Abstract :
Partitioned Global Address Space (PGAS) languages offer significant programmability advantages with its global memory view abstraction, one-sided communication constructs and data locality awareness. These attributes place PGAS languages at the forefront of possible solutions to the exploding programming complexity in the many-core architectures. To enable the shared address space abstraction, PGAS languages use an address translation mechanism while accessing shared memory to convert shared addresses to physical addresses. This mechanism is already expensive in terms of performance in distributed memory environments, but it becomes a major bottleneck in machines with shared memory support where the access latencies are significantly lower. Multi- and many-core processors exhibit even lower latencies for shared data due to on-chip cache space utilization. Thus, efficient handling of address translation becomes even more crucial as this overhead may easily become the dominant factor in the overall data access time for such architectures. To alleviate address translation overhead, this paper introduces a new mechanism targeting multi-dimensional arrays used in most scientific and image processing applications. Relative costs and the implementation details for UPC are evaluated with different workloads (matrix multiplication, Random Access benchmark and Sobel edge detection) on two different platforms: a many-core system, the TILE64 (a 64 core processor) and a dual-socket, quad-core Intel Nehalem system (up to 16 threads). Our optimization provides substantial performance improvements, up to 40x. In addition, the proposed mechanism can easily be integrated into compilers abstracting it from the programmers. Accordingly, this improves UPC productivity as it will reduce manual optimization efforts required to minimize the address translation overhead.
Keywords :
C language; cache storage; distributed memory systems; distributed shared memory systems; parallel architectures; parallel programming; parallelising compilers; storage allocation; PGAS language; Random Access benchmark; Sobel edge detection; TILE64; access latency; address translation mechanism; address translation optimization; address translation overhead; data access; data locality awareness; distributed memory environment; dual-socket quad-core Intel Nehalem system; global memory view abstraction; many-core architecture; many-core processors; matrix multiplication; on-chip cache space utilization; one-sided communication construct; parallel programming; partitioned global address space language; physical address; programming complexity; shared address space abstraction; shared data; shared memory access; shared memory support; unified parallel C multidimensional array; Arrays; Electronics packaging; Instruction sets; Optimization; Table lookup; Tiles;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.279
Filename :
6008969
Link To Document :
بازگشت