DocumentCode
229161
Title
Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs
Author
Georgakoudis, Giorgis ; Nikolopoulos, Dimitrios S. ; Vandierendonck, Hans ; Lalis, Spyros
Author_Institution
Queen´s Univ. of Belfast, Belfast, UK
fYear
2014
fDate
14-17 July 2014
Firstpage
156
Lastpage
163
Abstract
Heterogeneous MPSoCs where different types of cores share a baseline ISA but implement different operational accelerators combine programmability with flexible customization. They hold promise for high performance under power and area limitations. However, transparent binary execution and dynamic scheduling is hard on those platforms. The stateof-the-art approach for transparent accelerated execution is fault-and-migrate (FAM): when a thread executes an accelerating instruction unavailable on the host core, it is forcibly migrated to an accelerating core which implements the instruction natively. Unfortunately, this approach prohibits dynamic scheduling through flexible thread migration, which is essential to any asymmetric platform for efficient utilization of heterogeneous resources. We present two distinct binary-level techniques - Dynamic Binary Rewriting (DBR) and Dynamic Binary Translation (DBT) - which enable selective acceleration, while preserving transparent thread execution and migration, to any core in the system, at any point in time. DBR rewrites binary code to exploit any accelerating instructions available in the host core. DBT implements a-fault-and-rewrite scheme, which sets up trampolines to emulation routines for these accelerating instructions which are not available on the host core. Both methods customize binary code on demand, enabling flexible migration. We evaluate the overhead of DBR and DBT against FAM on a real hardware shared-ISA MPSoC prototype. Experiments with single-thread programs show flexible migration is possible with manageable overhead. We measure the performance of our binary-level techniques by artificially triggering periodic thread migration between a Base and an accelerating (ACC) core. Periodic migration, without aiming for optimized scheduling, results in an average slowdown of about 40% under DBR or about 10% under DBT, compared to FAM driven scheduling. We also show results for a speedup proportional dynamic schedule- , enabled by our techniques, using multi-program workloads. In this case, up to 50% faster execution times can be achieved by leveraging flexible thread migration.
Keywords
multiprocessing systems; processor scheduling; system-on-chip; DBR; DBT; FAM; dynamic binary rewriting; dynamic binary translation; dynamic scheduling; fault-and-migrate; flexible thread migration; shared-ISA heterogeneous MPSoCs; Acceleration; Benchmark testing; Computer architecture; Distributed Bragg reflectors; Dynamic scheduling; Emulation; Software;
fLanguage
English
Publisher
ieee
Conference_Titel
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014 International Conference on
Conference_Location
Agios Konstantinos
Type
conf
DOI
10.1109/SAMOS.2014.6893207
Filename
6893207
Link To Document