DocumentCode
3226745
Title
Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform
Author
Ying Zhang ; Saizheng Zhang
Author_Institution
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2013
fDate
4-6 Nov. 2013
Firstpage
71
Lastpage
78
Abstract
In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA´s GPU). Carefully designed layer-wise strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architecture´s propagation processes. In our experiment, these kernels save 70% time on average comparing with the kernels in NVIDIA´s CUBLAS library (widely used by many other neural network toolkits), and help our parallel deep architecture beats the neural structures using CUBLAS kernels in practical problems.
Keywords
graphics processing units; learning (artificial intelligence); matrix algebra; neural nets; parallel architectures; parallel programming; NVIDIA CUBLAS library; NVIDIA GPU; deep architecture propagation process; fast matrix operation kernels; flexible layer structures; layer-wise strategies; neural network toolkits; neural structure; neural training-testing system; optimized deep learning architecture; parallel computing platform; parallel deep architecture; Computer architecture; Graphics processing units; Integrated circuits; Kernel; Libraries; Training; Vectors; GPU; deep architecture; deep learning; kernel; matrix operation; parallel computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
Conference_Location
Herndon, VA
ISSN
1082-3409
Print_ISBN
978-1-4799-2971-9
Type
conf
DOI
10.1109/ICTAI.2013.21
Filename
6735232
Link To Document