Exploring hardware support for scaling irregular applications on multi-node multi-core architectures

Author

Secchi, Simone ; Ceriani, Marco ; Tumeo, Antonino ; Villa, Oreste ; Palermo, Gianluca ; Raffo, Luigi

Author_Institution

DIEE, Univ. degli Studi di Cagliari, Cagliari, Italy

fYear

2013

fDate

5-7 June 2013

Firstpage

309

Lastpage

313

Abstract

The recent emergence of large-scale knowledge discovery, data mining and social network analysis, irregular applications have gained renewed interest. Cache-based architectures do not provide optimal performances with such workloads, mainly due to the low spatial and temporal locality of their control and memory access patterns. This paper presents a multi-node, multi-core, multi-threaded shared-memory system architecture designed for the execution of large-scale irregular applications, and built on top of three pillars that support these workloads. First, transparent hardware support for Partitioned Global Address Space (PGAS) provides a large globally-shared address space with no software library overhead. Second, multithreaded multi-core processing nodes achieve the necessary latency tolerance required when accessing physically distributed global memory. Third, hardware support is provided for inter-thread synchronization on the global address space. An analytical performance model that accounts for the main architecture and application characteristics is presented. The hardware design of the proposed custom architectural building blocks is then described. Finally, a multi-board FPGA prototype of the proposed system with typical irregular kernels and benchmarks is presented. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.

Keywords

field programmable gate arrays; multi-threading; parallel architectures; shared memory systems; synchronisation; PGAS; architecture performance scalability; cache-based architecture; custom architectural building block; data mining; globally-shared address space; interthread synchronization; irregular application scaling; knowledge discovery; memory access pattern; multiboard FPGA prototype; multicore shared-memory system architecture; multinode multicore architecture; multinode shared-memory system architecture; multithreaded multicore processing node; multithreaded shared-memory system architecture; partitioned global address space; physically distributed global memory; social network analysis; transparent hardware support; Bandwidth; Computer architecture; Hardware; Instruction sets; Prototypes; System-on-chip;

fLanguage

English

Publisher

ieee

Conference_Titel

Application-Specific Systems, Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on

Conference_Location

Washington, DC

ISSN

2160-0511

Print_ISBN

978-1-4799-0494-5

Type

conf

DOI

10.1109/ASAP.2013.6567595

Filename

6567595