• DocumentCode
    228747
  • Title

    Parallel Deep Neural Network Training for Big Data on Blue Gene/Q

  • Author

    I-Hsin Chung ; Sainath, Tara N. ; Ramabhadran, Bhuvana ; Pichen, Michael ; Gunnels, John ; Austel, Vernon ; Chauhari, Upendra ; Kingsbury, Brian

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    16-21 Nov. 2014
  • Firstpage
    745
  • Lastpage
    753
  • Abstract
    Deep Neural Networks (DNNs) have recently been shown to significantly outperform existing machine learning techniques in several pattern recognition tasks. DNNs are the state-of-the-art models used in image recognition, object detection, classification and tracking, and speech and language processing applications. The biggest drawback to DNNs has been the enormous cost in computation and time taken to train the parameters of the networks - often a tenfold increase relative to conventional technologies. Such training time costs can be mitigated by the application of parallel computing algorithms and architectures. However, these algorithms often run into difficulties because of the cost of inter-processor communication bottlenecks. In this paper, we describe how to enable Parallel Deep Neural Network Training on the IBM Blue Gene/Q (BG/Q) computer system. Specifically, we explore DNN training using the data parallel Hessian-free 2nd order optimization algorithm. Such an algorithm is particularly well-suited to parallelization across a large set of loosely coupled processors. BG/Q, with its excellent inter-processor communication characteristics, is an ideal match for this type of algorithm. The paper discusses how issues regarding programming model and data-dependent imbalances are addressed. Results on large-scale speech tasks show that the performance on BG/Q scales linearly up to 4096 processes with no loss in accuracy. This allows us to train neural networks using billions of training examples in a few hours.
  • Keywords
    Big Data; learning (artificial intelligence); neural nets; parallel architectures; pattern recognition; DNN; IBM BG/Q computer system; IBM Blue Gene/Q computer system; big data; data-parallel Hessian-free 2nd order optimization algorithm; interprocessor communication characteristics; machine learning techniques; parallel architectures; parallel computing algorithms; parallel deep neural network training; pattern recognition tasks; programming model; training time costs; Neural networks; Optimization; Prefetching; Speech recognition; Synchronization; Training; Big Data; High Performance Computing; Speech Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4799-5499-5
  • Type

    conf

  • DOI
    10.1109/SC.2014.66
  • Filename
    7013048