Title :
Comparing a High and Low-Level Deep Neural Network Implementation for Automatic Speech Recognition
Author :
Ray, Jessica ; Thompson, Brian ; Shen, Wade
Author_Institution :
MIT Lincoln Lab., Lexington, MA, USA
Abstract :
The use of deep neural networks (DNNs) has improved performance in several fields including computer vision, natural language processing, and automatic speech recognition (ASR). The increased use of DNNs in recent years has been largely due to performance afforded by GPUs, as the computational cost of training large networks on a CPU is prohibitive. Many training algorithms are well-suited to the GPU; however, writing hand-optimized GPGPU code is a significant undertaking. More recently, high-level libraries have attempted to simplify GPGPU development by automatically performing tasks such as optimization and code generation. This work utilizes Theano, a high-level Python library, to implement a DNN for the purpose of phone recognition in ASR. Performance is compared against a low-level, hand-optimized C++/CUDA DNN implementation from Kaldi, a popular ASR toolkit. Results show that the DNN implementation in Theano has CPU and GPU runtimes on par with that of Kaldi, while requiring approximately 95% less lines of code.
Keywords :
learning (artificial intelligence); neural nets; optimisation; speech recognition; ASR toolkit; CPU runtimes; GPU runtimes; automatic speech recognition; code generation; computer vision; deep neural networks; hand-optimized C++/CUDA DNN implementation; hand-optimized GPGPU code; high level Python library; low level deep neural network; natural language processing; optimization; phone recognition; training algorithms; Acoustics; Graphics processing units; Hidden Markov models; Neural networks; Speech; Training; Vectors; Python; Theano; DNN; Kaldi; CUDA; GPU;
Conference_Titel :
High Performance Technical Computing in Dynamic Languages (HPTCDL), 2014 First Workshop for
DOI :
10.1109/HPTCDL.2014.12