DocumentCode
2441327
Title
Structuring the execution of OpenMP applications for multicore architectures
Author
Broquedis, François ; Aumage, Olivier ; Goglin, Brice ; Thibault, Samuel ; Wacrenier, Pierre-Andr ; Namyst, Raymond
Author_Institution
INRIA, Univ. of Bordeaux, Talence, France
fYear
2010
fDate
19-23 April 2010
Firstpage
1
Lastpage
10
Abstract
The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user friendliness of shared memory on the one side, and memory access scalability and efficiency on the other side. However, to get high performance out of such machines requires a dynamic mapping of application tasks and data onto the underlying architecture. Moreover, depending on the application behavior, this mapping should favor cache affinity, memory bandwidth, computation synchrony, or a combination of these. The great challenge is then to perform this hardware-dependent mapping in a portable, abstract way. To meet this need, we propose a new, hierarchical approach to the execution of OpenMP threads onto multicore machines. Our ForestGOMP runtime system dynamically generates structured trees out of OpenMP programs. It collects relationship information about threads and data as well. This information is used together with scheduling hints and hardware counter feedback by the scheduler to select the most appropriate threads and data distribution. ForestGOMP features a highlevel platform for developing and tuning portable threads schedulers. We present several applications for which we developed specific scheduling policies that achieve excellent speedups on 16-core machines.
Keywords
application program interfaces; cache storage; microprocessor chips; multiprocessing systems; parallel architectures; shared memory systems; ForestGOMP runtime system; OpenMP threads; application tasks; cache banks; data distribution; hardware counter feedback; memory access scalability; memory banks; multicore architectures; multicore chips; parallel computers; scheduling hints; Application software; Bandwidth; Computer architecture; Concurrent computing; Counting circuits; Hardware; Human computer interaction; Multicore processing; Scalability; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location
Atlanta, GA
ISSN
1530-2075
Print_ISBN
978-1-4244-6442-5
Type
conf
DOI
10.1109/IPDPS.2010.5470442
Filename
5470442
Link To Document