Title :
Unrolling shape for out-of-order processors
Author_Institution :
Inf. Technol. Center, Tokyo Univ., Japan
Abstract :
Loop unrolling is today one of the most effective optimizations for modern architectures. To give an analytical model for loop unrolling performance, unrolling shape was proposed. It was applied to in-order processors, and was proved to give an accurate performance model for loop unrolling in term of software pipelining and cache miss alleviation. In this paper, we apply unrolling shape to out-of-order processors. A scheme for calculating PLOOO, pipelining terms of an unrolled loop by factor l are presented as PLOOO(l) = {(Nins(l)/F + NOccpy(l))}/l, where Nins(l) is the number of instructions in an unrolled loop by factor l, F the fetch rate of the architecture, NOccpy(l) the number of store instructions scheduled after Nins(l)/F-th cycle. A pipelining term for in-order processors is essential for calculating NOccpy(l). It is to be noted that the scheme for out-of-order processors uses unrolling shape for in-order processors. Experiments show that our scheme is precise in calculating the behaviour of loop unrolling on out-of-order processors. We show that our scheme quantitatively shows the effect of loop unrolling as the one of infinitely unrolled loops on in-order processors. Furthermore, we reveal that the old folklore that the loop unrolling reduces the loop overhead has revived on out-of-order processors as a performance improvement factor as d/dlPLOOO (Aho et al., 1986).
Keywords :
instruction sets; optimising compilers; pipeline processing; program control structures; analytical model; cache miss alleviation; computer architectures; in-order processors; instruction set; loop unrolling performance; optimization; out-of-order processors; performance model; software pipelining; Out of order; Shape;
Conference_Titel :
Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003
Print_ISBN :
0-7695-2019-7
DOI :
10.1109/IWIA.2003.1262786