• DocumentCode
    624374
  • Title

    Exploring hardware support for scaling irregular applications on multi-node multi-core architectures

  • Author

    Secchi, Simone ; Ceriani, Marco ; Tumeo, Antonino ; Villa, Oreste ; Palermo, Gianluca ; Raffo, Luigi

  • Author_Institution
    DIEE, Univ. degli Studi di Cagliari, Cagliari, Italy
  • fYear
    2013
  • fDate
    5-7 June 2013
  • Firstpage
    309
  • Lastpage
    313
  • Abstract
    The recent emergence of large-scale knowledge discovery, data mining and social network analysis, irregular applications have gained renewed interest. Cache-based architectures do not provide optimal performances with such workloads, mainly due to the low spatial and temporal locality of their control and memory access patterns. This paper presents a multi-node, multi-core, multi-threaded shared-memory system architecture designed for the execution of large-scale irregular applications, and built on top of three pillars that support these workloads. First, transparent hardware support for Partitioned Global Address Space (PGAS) provides a large globally-shared address space with no software library overhead. Second, multithreaded multi-core processing nodes achieve the necessary latency tolerance required when accessing physically distributed global memory. Third, hardware support is provided for inter-thread synchronization on the global address space. An analytical performance model that accounts for the main architecture and application characteristics is presented. The hardware design of the proposed custom architectural building blocks is then described. Finally, a multi-board FPGA prototype of the proposed system with typical irregular kernels and benchmarks is presented. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.
  • Keywords
    field programmable gate arrays; multi-threading; parallel architectures; shared memory systems; synchronisation; PGAS; architecture performance scalability; cache-based architecture; custom architectural building block; data mining; globally-shared address space; interthread synchronization; irregular application scaling; knowledge discovery; memory access pattern; multiboard FPGA prototype; multicore shared-memory system architecture; multinode multicore architecture; multinode shared-memory system architecture; multithreaded multicore processing node; multithreaded shared-memory system architecture; partitioned global address space; physically distributed global memory; social network analysis; transparent hardware support; Bandwidth; Computer architecture; Hardware; Instruction sets; Prototypes; System-on-chip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Application-Specific Systems, Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    2160-0511
  • Print_ISBN
    978-1-4799-0494-5
  • Type

    conf

  • DOI
    10.1109/ASAP.2013.6567595
  • Filename
    6567595