DocumentCode :
3350646
Title :
The effect of program optimization on trace cache efficiency
Author :
Howard, Derek L. ; Lipasti, Mikko H.
Author_Institution :
Server Group, IBM Corp., Rochester, MN, USA
fYear :
1999
fDate :
1999
Firstpage :
256
Lastpage :
261
Abstract :
Trace cache, an instruction fetch technique that reduces token branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves instruction fetch bandwidth. Similarly, program transformations like loop unrolling, procedure inlining, feedback-directed program restructuring, and profile-directed feedback can improve instruction fetch bandwidth by changing the static structure and ordering of a program´s basic blocks. We examine the interaction of these compile-time and run-time techniques in the context of a high-quality production compiler that implements such transformations and a cycle-accurate simulation model of a wide issue superscalar processor. Not surprisingly, we find that the relative benefit of adding trace cache declines with increasing optimization level, and vice versa. Furthermore, we find that certain optimizations that improve performance on a processor model without trace cache can actually degrade performance on a processor with trace cache due to increased branch history table interference. Finally, we show that the performance obtained with a trace cache of a given size can be obtained with a trace cache of about half the size by applying aggressive compiler optimization techniques
Keywords :
cache storage; optimising compilers; virtual machines; aggressive compiler optimization techniques; branch history table interference; compile-time techniques; cycle-accurate simulation model; dynamic execution order; feedback-directed program restructuring; high-quality production compiler; instruction fetch bandwidth; loop unrolling; procedure inlining; processor model; profile-directed feedback; program instruction fetching; program instruction storage; program optimization; program transformations; run-time techniques; static ordering; static structure; token branch penalty reduction; trace cache efficiency; wide issue superscalar processor; Bandwidth; Context modeling; Degradation; Feedback loop; History; Optimizing compilers; Production; Program processors; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques, 1999. Proceedings. 1999 International Conference on
Conference_Location :
Newport Beach, CA
ISSN :
1089-795X
Print_ISBN :
0-7695-0425-6
Type :
conf
DOI :
10.1109/PACT.1999.807570
Filename :
807570
Link To Document :
بازگشت