DocumentCode :
3326714
Title :
Neither more nor less: optimizing thread-level parallelism for GPGPUs
Author :
Arnau, Jose-Maria ; Parcerisa, Joan-Manuel ; Xekalakis, Polychronis
Author_Institution :
Comput. Archit. Dept., Univ. Politec. de Catalunya, Barcelona, Spain
fYear :
2013
fDate :
7-11 Sept. 2013
Firstpage :
157
Lastpage :
166
Abstract :
Perhaps one of the most important design aspects for smartphones and tablets is improving their energy efficiency. Unfortunately, rich media content applications typically put significant pressure to the GPU´s memory subsystem. In this paper we propose a novel means of dramatically improving the energy efficiency of these devices, for this popular type of applications. The main hurdle in doing so is that GPUs require a significant amount of memory bandwidth in order to fetch all the necessary textures from memory. Although consecutive frames tend to operate on the same textures, their re-use distances are so big that to the caches fetching textures appears to be a streaming operation. Traditional designs improve the degree of multi-threading and the memory bandwidth, as a means of improving performance. In order to meet the energy efficiency standards required by the mobile market, we need a different approach. We thus propose a technique which we term Parallel Frame Rendering (PFR). Under PFR, we split the GPU into two clusters where two consecutive frames are rendered in parallel. PFR exploits the high degree of similarity between consecutive frames to save memory bandwidth by improving texture locality. Since the physics part of the rendering has to be computed sequentially for two consecutive frames, this naturally leads to an increase in the input delay latency for PFR compared with traditional systems. However we argue that this is rarely an issue, as the user interface in these devices is much slower than those of desktop systems. Moreover, we show that we can design reactive forms of PFR that allow us to bound the lag observed by the end user, thus maintaining the highest user experience when necessary. Overall we show that PFR can achieve 28% of memory bandwidth savings with only minimal loss in system responsiveness.
Keywords :
cache storage; graphical user interfaces; graphics processing units; integrated circuit design; mobile computing; multi-threading; notebook computers; performance evaluation; rendering (computer graphics); caches fetching textures; consecutive frames; design aspects; energy efficiency improvement; memory bandwidth improvement; memory subsystem; mobile GPU; multithreading degree improvement; parallel frame rendering; performance improvement; rich media content applications; smartphones; streaming operation; tablets; texture locality improvement; trading responsiveness; user interface; Bandwidth; Graphics processing units; Memory management; Mobile communication; Rendering (computer graphics); Switches; Tiles; GPGPUs; scheduling; thread-level parallelism;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on
Conference_Location :
Edinburgh
ISSN :
1089-795X
Print_ISBN :
978-1-4799-1018-2
Type :
conf
DOI :
10.1109/PACT.2013.6618806
Filename :
6618806
Link To Document :
بازگشت