• DocumentCode
    1686308
  • Title

    Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR

  • Author

    Plahl, Christian ; Kozielski, Michal ; Schluter, Ralf ; Ney, Hermann

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2013
  • Firstpage
    6714
  • Lastpage
    6718
  • Abstract
    This paper investigates the combination of different short-term features and the combination of recurrent and non-recurrent neural networks (NNs) on a Spanish speech recognition task. Several methods exist to combine different feature sets such as concatenation or linear discriminant analysis (LDA). Even though all these techniques achieve reasonable improvements, feature combination by multi-layer perceptrons (MLPs) outperforms all known approaches. We develop the concept of MLP based feature combination further using recurrent neural networks (RNNs). The phoneme posterior estimates derived from an RNN lead to a significant improvement over the result of the MLPs and achieve a 5% relative better word error rate (WER) with much less parameters. Moreover, we improve the system performance further by combining an MLP and an RNN in a hierarchical framework. The MLP benefits from the preprocessing of the RNN. All NNs are trained on phonemes. Nevertheless, the same concepts could be applied using context-dependent states. In addition to the improvements in recognition performance w.r.t. WER, NN based feature combination methods reduce both, the training and the testing complexity. Overall, the systems are based on a single set of acoustic models, together with the training of different NNs.
  • Keywords
    acoustic signal processing; error statistics; multilayer perceptrons; natural language processing; recurrent neural nets; speech recognition; LVCSR; MLP based feature combination; RNN; Spanish speech recognition task; WER; acoustic models; context-dependent states; multilayer perceptrons; phoneme posterior estimates; recognition performance; recurrent neural networks; system performance; testing complexity; word error rate; Acoustics; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Speech recognition; Training; feature combination; long-short-term-memory; multi-layer perceptron; recurrent neural networks; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638961
  • Filename
    6638961