Title :
Parallel Deep Neural Network Training for Big Data on Blue Gene/Q
Author :
I-Hsin Chung ; Sainath, Tara N. ; Ramabhadran, Bhuvana ; Pichen, Michael ; Gunnels, John ; Austel, Vernon ; Chauhari, Upendra ; Kingsbury, Brian
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
Deep Neural Networks (DNNs) have recently been shown to significantly outperform existing machine learning techniques in several pattern recognition tasks. DNNs are the state-of-the-art models used in image recognition, object detection, classification and tracking, and speech and language processing applications. The biggest drawback to DNNs has been the enormous cost in computation and time taken to train the parameters of the networks - often a tenfold increase relative to conventional technologies. Such training time costs can be mitigated by the application of parallel computing algorithms and architectures. However, these algorithms often run into difficulties because of the cost of inter-processor communication bottlenecks. In this paper, we describe how to enable Parallel Deep Neural Network Training on the IBM Blue Gene/Q (BG/Q) computer system. Specifically, we explore DNN training using the data parallel Hessian-free 2nd order optimization algorithm. Such an algorithm is particularly well-suited to parallelization across a large set of loosely coupled processors. BG/Q, with its excellent inter-processor communication characteristics, is an ideal match for this type of algorithm. The paper discusses how issues regarding programming model and data-dependent imbalances are addressed. Results on large-scale speech tasks show that the performance on BG/Q scales linearly up to 4096 processes with no loss in accuracy. This allows us to train neural networks using billions of training examples in a few hours.
Keywords :
Big Data; learning (artificial intelligence); neural nets; parallel architectures; pattern recognition; DNN; IBM BG/Q computer system; IBM Blue Gene/Q computer system; big data; data-parallel Hessian-free 2nd order optimization algorithm; interprocessor communication characteristics; machine learning techniques; parallel architectures; parallel computing algorithms; parallel deep neural network training; pattern recognition tasks; programming model; training time costs; Neural networks; Optimization; Prefetching; Speech recognition; Synchronization; Training; Big Data; High Performance Computing; Speech Recognition;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5