• DocumentCode
    1799902
  • Title

    Multi-GPU System Design with Memory Networks

  • Author

    Gwangsun Kim ; Minseok Lee ; Jiyun Jeong ; Kim, Jung-Ho

  • Author_Institution
    Dept. of Comput. Sci., KAIST, Daejeon, South Korea
  • fYear
    2014
  • fDate
    13-17 Dec. 2014
  • Firstpage
    484
  • Lastpage
    495
  • Abstract
    GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing and unified memory from NVIDIA, has made programming simpler but the costly remote memory access still makes multi-GPU programming difficult. In order to overcome the communication limitations, we propose to leverage the memory network based on hybrid memory cubes (HMCs) to simplify multi-GPU memory management and improve programmability. In particular, we propose scalable kernel execution (SKE) where multiple GPUs are viewed as a single virtual GPU as a single kernel can be executed across multiple GPUs without modifying the source code. To fully enable the benefits of SKE, we explore alternative memory network designs in a multi-GPU system. We propose a GPU memory network (GMN) to simplify data sharing between the discrete GPUs while a CPU memory network (CMN) is used to simplify data communication between the host CPU and the discrete GPUs. These two types of networks can be combined to create a unified memory network (UMN) where the communication bottleneck in multi-GPU can be significantly minimized as both the CPU and GPU share the memory network. We evaluate alternative network designs and propose a sliced flattened butterfly topology for the memory network that scales better than previously proposed alternative topologies by removing local HMC channels. In addition, we propose an overlay network organization for unified memory network to minimize the latency for CPU access while providing high bandwidth for the GPUs. We evaluate trade-offs between the different memory network organization and show how UMN significantly reduces the communication bottleneck in mu- ti-GPU systems.
  • Keywords
    graphics processing units; hypercube networks; network topology; random-access storage; storage management; CMN; CPU memory network; GMN; GPU memory network; GPU-host CPU communication; HMC; SKE; UMN; data communication; data sharing; hybrid memory cubes; multiGPU memory management; multiGPU programming; multiGPU system design; overlay network organization; remote GPU memory; scalable kernel execution; sliced flattened butterfly topology; unified memory network; Bandwidth; Graphics processing units; Kernel; Memory management; Network topology; Runtime; Topology; Flattened butterfly; Hybrid Memory Cubes; Memory network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
  • Conference_Location
    Cambridge
  • ISSN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2014.55
  • Filename
    7011411