• DocumentCode
    3663916
  • Title

    DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

  • Author

    Johann Hauswald;Yiping Kang;Michael A. Laurenzano;Quan Chen;Cheng Li;Trevor Mudge;Ronald G. Dreslinski;Jason Mars;Lingjia Tang

  • Author_Institution
    Clarity Lab, University of Michigan - Ann Arbor, USA
  • fYear
    2015
  • fDate
    6/1/2015 12:00:00 AM
  • Firstpage
    27
  • Lastpage
    40
  • Abstract
    As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120× for all but one application (40× for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000× throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20×, depending on the composition of the workload.
  • Keywords
    "Servers","Graphics processing units","Throughput","Neural networks","Neurons","Libraries","Face"
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on
  • Type

    conf

  • DOI
    10.1145/2749469.2749472
  • Filename
    7284053