• DocumentCode
    177469
  • Title

    Improving deep neural network acoustic models using generalized maxout networks

  • Author

    Xiaohui Zhang ; Trmal, Jan ; Povey, Daniel ; Khudanpur, Sanjeev

  • Author_Institution
    Center for Language & Speech Process. & Human Language Technol., Johns Hopkins Univ., Baltimore, MD, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    215
  • Lastpage
    219
  • Abstract
    Recently, maxout networks have brought significant improvements to various speech recognition and computer vision tasks. In this paper we introduce two new types of generalized maxout units, which we call p-norm and soft-maxout. We investigate their performance in Large Vocabulary Continuous Speech Recognition (LVCSR) tasks in various languages with 10 hours and 60 hours of data, and find that the p-norm generalization of maxout consistently performs well. Because, in our training setup, we sometimes see instability during training when training unbounded-output nonlinearities such as these, we also present a method to control that instability. This is the “normalization layer”, which is a nonlinearity that scales down all dimensions of its input in order to stop the average squared output from exceeding one. The performance of our proposed nonlinearities are compared with maxout, rectified linear units (ReLU), tanh units, and also with a discriminatively trained SGMM/HMM system, and our p-norm units with p equal to 2 are found to perform best.
  • Keywords
    generalisation (artificial intelligence); neural nets; speech recognition; LVCSR task; ReLU; computer vision task; deep neural network acoustic models; generalized maxout networks; large vocabulary continuous speech recognition; normalization layer; p-norm generalization; p-norm units; rectified linear units; soft-maxout; unbounded-output nonlinearities; Acoustics; Neural networks; Speech; Speech processing; Speech recognition; Training; Training data; Acoustic Modeling; Deep Learning; Maxout Networks; Speech Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6853589
  • Filename
    6853589