• DocumentCode
    730716
  • Title

    Context dependent phone models for LSTM RNN acoustic modelling

  • Author

    Senior, Andrew ; Sak, Hasim ; Shafran, Izhak

  • Author_Institution
    Google Inc., New York, NY, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4585
  • Lastpage
    4589
  • Abstract
    Long Short Term Memory Recurrent Neural Networks (LSTM RNNs), combined with hidden Markov models (HMMs), have recently been show to outperform other acoustic models such as Gaussian mixture models (GMMs) and deep neural networks (DNNs) for large scale speech recognition. We argue that using multi-state HMMs with LSTM RNN acoustic models is an unnecessary vestige of GMM-HMM and DNN-HMM modelling since LSTM RNNs are able to predict output distributions through continuous, instead of piece-wise stationary, modelling of the acoustic trajectory. We demonstrate equivalent results for context independent whole-phone or 3-state models and show that minimum-duration modelling can lead to improved results. We go on to show that context dependent whole-phone models can perform as well as context dependent states, given a minimum duration model.
  • Keywords
    recurrent neural nets; speech recognition; LSTM RNN acoustic modelling; context dependent phone models; large scale speech recognition; long short term memory recurrent neural networks; multistate hidden Markov models; Acoustics; Context; Context modeling; Hidden Markov models; Recurrent neural networks; Speech recognition; Training; Hybrid neural networks; Long Short-Term Memory Recurrent Neural Networks; context dependent phone models; hidden Markov models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178839
  • Filename
    7178839