A run-time modulo scheduling by using a binary translation mechanism

Author

Ferreira, Ricardo ; Denver, Waldir ; Pereira, Manuela ; Quadros, Jorge ; Carro, Luigi ; Wong, Simon

Author_Institution

Dept. Inf., UFV, Vicosa, Brazil

fYear

2014

fDate

14-17 July 2014

Firstpage

75

Lastpage

82

Abstract

It is well known that innermost loop optimizations have a big effect on the total execution time. Although CGRAs is widely used for this type of optimizations, their usage at run-time has been limited due to the overheads introduced by application analysis, code transformation, and reconfiguration. These steps are normally performed during compile time. In this work, we present the first dynamic translation technique for the modulo scheduling approach that can convert binary code on-the-fly to run on a CGRA. The proposed mechanism ensures software compatibility as it supports different source ISAs. As proof of concept of scaling, a change in the memory bandwidth has been evaluated (from one memory access per cycle to two memory accesses per cycle). Moreover, a comparison to the state-of-the-art static compiler-based approaches for inner loop accelerators has been done by using CGRA and VLIW as target architectures. Additionally, to measure area and performance, the proposed CGRA was prototyped on a FPGA. The area comparisons show that crossbar CGRA (with 16 processing elements) is 1.9x larger than the VLIW 4-issue and 1.3x smaller than a VLIW 8-issue softcore processor, respectively. In addition, it reaches an overall speedup factor of 2.17x and 2.0x in comparison to the 4 and 8-issue, respectively. Our results also demonstrate that the run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for an n-issue VLIW processor.

Keywords

field programmable gate arrays; multiprocessing systems; parallel architectures; processor scheduling; program compilers; program control structures; CGRA; FPGA; VLIW processor; application analysis; binary code on-the-fly; binary translation mechanism; code transformation; dynamic translation technique; inner loop accelerators; loop optimizations; memory access; memory bandwidth; processing elements; reconfiguration; run-time modulo scheduling; software compatibility; source ISA; speedup factor; static compiler-based approaches; target architectures; total execution time; Clocks; Computational modeling; Computer architecture; Registers; Software; VLIW; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 International Conference on

Conference_Location

Agios Konstantinos

Type

conf

DOI

10.1109/SAMOS.2014.6893197

Filename

6893197