Title :
A run-time modulo scheduling by using a binary translation mechanism
Author :
Ferreira, Ricardo ; Denver, Waldir ; Pereira, Manuela ; Quadros, Jorge ; Carro, Luigi ; Wong, Simon
Author_Institution :
Dept. Inf., UFV, Vicosa, Brazil
Abstract :
It is well known that innermost loop optimizations have a big effect on the total execution time. Although CGRAs is widely used for this type of optimizations, their usage at run-time has been limited due to the overheads introduced by application analysis, code transformation, and reconfiguration. These steps are normally performed during compile time. In this work, we present the first dynamic translation technique for the modulo scheduling approach that can convert binary code on-the-fly to run on a CGRA. The proposed mechanism ensures software compatibility as it supports different source ISAs. As proof of concept of scaling, a change in the memory bandwidth has been evaluated (from one memory access per cycle to two memory accesses per cycle). Moreover, a comparison to the state-of-the-art static compiler-based approaches for inner loop accelerators has been done by using CGRA and VLIW as target architectures. Additionally, to measure area and performance, the proposed CGRA was prototyped on a FPGA. The area comparisons show that crossbar CGRA (with 16 processing elements) is 1.9x larger than the VLIW 4-issue and 1.3x smaller than a VLIW 8-issue softcore processor, respectively. In addition, it reaches an overall speedup factor of 2.17x and 2.0x in comparison to the 4 and 8-issue, respectively. Our results also demonstrate that the run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for an n-issue VLIW processor.
Keywords :
field programmable gate arrays; multiprocessing systems; parallel architectures; processor scheduling; program compilers; program control structures; CGRA; FPGA; VLIW processor; application analysis; binary code on-the-fly; binary translation mechanism; code transformation; dynamic translation technique; inner loop accelerators; loop optimizations; memory access; memory bandwidth; processing elements; reconfiguration; run-time modulo scheduling; software compatibility; source ISA; speedup factor; static compiler-based approaches; target architectures; total execution time; Clocks; Computational modeling; Computer architecture; Registers; Software; VLIW; Vectors;
Conference_Titel :
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 International Conference on
Conference_Location :
Agios Konstantinos
DOI :
10.1109/SAMOS.2014.6893197