DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

Author

Johann Hauswald;Yiping Kang;Michael A. Laurenzano;Quan Chen;Cheng Li;Trevor Mudge;Ronald G. Dreslinski;Jason Mars;Lingjia Tang

Author_Institution

Clarity Lab, University of Michigan - Ann Arbor, USA

fYear

2015

fDate

6/1/2015 12:00:00 AM

Firstpage

27

Lastpage

40

Abstract

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120× for all but one application (40× for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000× throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20×, depending on the composition of the workload.

Keywords

"Servers","Graphics processing units","Throughput","Neural networks","Neurons","Libraries","Face"

Publisher

ieee

Conference_Titel

Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on

Type

conf

DOI

10.1145/2749469.2749472

Filename

7284053