DocumentCode
769401
Title
Balancing reuse opportunities and performance gains with subblock value reuse
Author
Huang, Jian ; Lija, D.J.
Author_Institution
Sun Microsystems, Bloomington, IN, USA
Volume
52
Issue
8
fYear
2003
Firstpage
1032
Lastpage
1050
Abstract
The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available instruction-level parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of their input operands. Value reuse, on the other hand, tries to eliminate redundant computation by storing the previously produced results of instructions and skipping the execution of redundant instructions. Previous value reuse mechanisms use a single instruction or a naturally formed instruction group, such as a basic block, a trace, or a function, as the reuse unit. These naturally-formed instruction groups are readily identifiable by the hardware at runtime without compiler assistance. However, the performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but they provide greater benefit for each reuse-detection process, it is very important to find the balance point that provides the largest overall performance gain. We propose a new mechanism called subblock reuse. Subblocks are created by slicing basic blocks either dynamically or with compiler guidance. The dynamic approaches use the number of instructions, numbers of inputs and outputs, or the presence of store instructions to determine the subblock boundaries. The compiler-assisted approach slices basic blocks using data-flow considerations to balance the reuse granularity and the number of reuse opportunities.
Keywords
data flow analysis; instruction sets; parallel programming; performance evaluation; program compilers; program slicing; software reusability; compiler flow analysis; data-flow; instruction-level parallelism; performance gains; program compiler; redundant instructions; reuse granularity; runtime; subblock value reuse; superscalar processors; value prediction; value reuse; Bandwidth; Computer aided instruction; Delay; Hardware; Helium; Parallel processing; Performance gain; Potential well; Program processors; Runtime;
fLanguage
English
Journal_Title
Computers, IEEE Transactions on
Publisher
ieee
ISSN
0018-9340
Type
jour
DOI
10.1109/TC.2003.1223638
Filename
1223638
Link To Document