مرکز منطقه ای اطلاع رساني علوم و فناوري - DaDianNao: A Machine-Learning Supercomputer

DocumentCode :

1799912

Title :

DaDianNao: A Machine-Learning Supercomputer

Author :

Yunji Chen ; Tao Luo ; Shaoli Liu ; Shijin Zhang ; Liqiang He ; Jia Wang ; Ling Li ; Tianshi Chen ; Zhiwei Xu ; Ninghui Sun ; Temam, Olivier

Author_Institution :

SKL of Comput. Archit., ICT, Beijing, China

fYear :

2014

fDate :

13-17 Dec. 2014

Firstpage :

609

Lastpage :

622

Abstract :

Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.

Keywords :

learning (artificial intelligence); mainframes; neural nets; parallel machines; CNN-DNN algorithmic characteristics; DaDianNao; GPU; computational capacity-area ratio; computational units; convolutional neural network; custom storage; deep neural network; general-purpose workloads; high-degree parallelism; industry-grade interconnects; machine-learning supercomputer; multichip machine-learning architecture; multichip system; neural network accelerators; Bandwidth; Biological neural networks; Computer architecture; Graphics processing units; Hardware; Kernel; Neurons; accelerator; computer architecture; machine learning; neural network;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on

Conference_Location :

Cambridge

ISSN :

1072-4451

Type :

conf

DOI :

10.1109/MICRO.2014.58

Filename :

7011421

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799912