Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

Author

Ying Zhang ; Saizheng Zhang

Author_Institution

Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China

fYear

2013

fDate

4-6 Nov. 2013

Firstpage

71

Lastpage

78

Abstract

In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA´s GPU). Carefully designed layer-wise strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architecture´s propagation processes. In our experiment, these kernels save 70% time on average comparing with the kernels in NVIDIA´s CUBLAS library (widely used by many other neural network toolkits), and help our parallel deep architecture beats the neural structures using CUBLAS kernels in practical problems.

Keywords

graphics processing units; learning (artificial intelligence); matrix algebra; neural nets; parallel architectures; parallel programming; NVIDIA CUBLAS library; NVIDIA GPU; deep architecture propagation process; fast matrix operation kernels; flexible layer structures; layer-wise strategies; neural network toolkits; neural structure; neural training-testing system; optimized deep learning architecture; parallel computing platform; parallel deep architecture; Computer architecture; Graphics processing units; Integrated circuits; Kernel; Libraries; Training; Vectors; GPU; deep architecture; deep learning; kernel; matrix operation; parallel computing;

fLanguage

English

Publisher

ieee

Conference_Titel

Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on

Conference_Location

Herndon, VA

ISSN

1082-3409

Print_ISBN

978-1-4799-2971-9

Type

conf

DOI

10.1109/ICTAI.2013.21

Filename

6735232