DocumentCode :
3571322
Title :
Global Optimization of Execution Mode Selection for the Reconfigurable PRAM-NUMA Multicore Architecture REPLICA
Author :
Hansson, Erik ; Kessler, Christoph
Author_Institution :
Dept. of Comput. & Inf. Sci., Linkoping Univ., Linköping, Sweden
fYear :
2014
Firstpage :
322
Lastpage :
328
Abstract :
The REPLICA architecture is a massively hardware threaded very long instruction word (VLIW) architecture. REPLICA has two execution modes supported by the underlying on-chip memory, PRAM and NUMA which can be switched between at runtime. PRAM mode is considered the standard execution mode and targets mainly applications with very high thread level parallelism (TLP). In contrast, NUMA mode is for sequential legacy applications and applications with low amount of TLP, but for some cases very regular applications suits NUMA mode as well. However, there is a switching cost between the modes which is not neglect able. We combine machine-learning (symbolic regression) with shortest path problem to optimize software composition of parameterized stencil-like algorithms which have regular control flow and memory access pattern. Using the tool Eureqa Pro which is based on symbolic regression and training data we can create predictors for execution time for parameterized software components. We use the predictors and formulate an optimization problem based on shortest path to map component execution on the available modes (PRAM or NUMA). When composing for three randomly selected components from an evaluation set we get speedups up to 2.9 times including overhead and an average speedup of 1.4 also including overhead. Overhead costs which includes running predictors, solving shortest path and switching to the selected runtime modes are just a few percent.
Keywords :
combinatorial mathematics; learning (artificial intelligence); multiprocessing systems; optimisation; parallel processing; reconfigurable architectures; Eureqa Pro tool; REPLICA architecture; TLP; VLIW architecture; control flow; execution mode selection; machine learning; memory access pattern; on-chip memory; optimization problem; parallel random access machine; parameterized stencil-like algorithm; reconfigurable PRAM-NUMA multicore architecture; sequential legacy application; shortest path problem; software composition; symbolic regression; very high thread level parallelism; very long instruction word; Computer architecture; Instruction sets; Optimization; Phase change random access memory; Runtime; Switches; NUMA; PRAM; machine-learning; multicore; optimized composition; software composition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Networking (CANDAR), 2014 Second International Symposium on
Type :
conf
DOI :
10.1109/CANDAR.2014.72
Filename :
7052204
Link To Document :
بازگشت