Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs

Author

Georgakoudis, Giorgis ; Nikolopoulos, Dimitrios S. ; Vandierendonck, Hans ; Lalis, Spyros

Author_Institution

Queen´s Univ. of Belfast, Belfast, UK

fYear

2014

fDate

14-17 July 2014

Firstpage

156

Lastpage

163

Abstract

Heterogeneous MPSoCs where different types of cores share a baseline ISA but implement different operational accelerators combine programmability with flexible customization. They hold promise for high performance under power and area limitations. However, transparent binary execution and dynamic scheduling is hard on those platforms. The stateof-the-art approach for transparent accelerated execution is fault-and-migrate (FAM): when a thread executes an accelerating instruction unavailable on the host core, it is forcibly migrated to an accelerating core which implements the instruction natively. Unfortunately, this approach prohibits dynamic scheduling through flexible thread migration, which is essential to any asymmetric platform for efficient utilization of heterogeneous resources. We present two distinct binary-level techniques - Dynamic Binary Rewriting (DBR) and Dynamic Binary Translation (DBT) - which enable selective acceleration, while preserving transparent thread execution and migration, to any core in the system, at any point in time. DBR rewrites binary code to exploit any accelerating instructions available in the host core. DBT implements a-fault-and-rewrite scheme, which sets up trampolines to emulation routines for these accelerating instructions which are not available on the host core. Both methods customize binary code on demand, enabling flexible migration. We evaluate the overhead of DBR and DBT against FAM on a real hardware shared-ISA MPSoC prototype. Experiments with single-thread programs show flexible migration is possible with manageable overhead. We measure the performance of our binary-level techniques by artificially triggering periodic thread migration between a Base and an accelerating (ACC) core. Periodic migration, without aiming for optimized scheduling, results in an average slowdown of about 40% under DBR or about 10% under DBT, compared to FAM driven scheduling. We also show results for a speedup proportional dynamic schedule- , enabled by our techniques, using multi-program workloads. In this case, up to 50% faster execution times can be achieved by leveraging flexible thread migration.

Keywords

multiprocessing systems; processor scheduling; system-on-chip; DBR; DBT; FAM; dynamic binary rewriting; dynamic binary translation; dynamic scheduling; fault-and-migrate; flexible thread migration; shared-ISA heterogeneous MPSoCs; Acceleration; Benchmark testing; Computer architecture; Distributed Bragg reflectors; Dynamic scheduling; Emulation; Software;

fLanguage

English

Publisher

ieee

Conference_Titel

Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 International Conference on

Conference_Location

Agios Konstantinos

Type

conf

DOI

10.1109/SAMOS.2014.6893207

Filename

6893207

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=229161