DocumentCode :
3744818
Title :
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling
Author :
Naoyuki Kanda;Mitsuyoshi Tachimori;Xugang Lu;Hisashi Kawai
Author_Institution :
National Institute of Information and Communications Technology, Japan
fYear :
2015
Firstpage :
15
Lastpage :
21
Abstract :
We propose two techniques to enhance the performance of recurrent neural network (RNN)-based acoustic models. The first technique addresses training efficiency. Because RNNs require sequential input, it is difficult to randomly shuffle training samples to accelerate stochastic gradient descent based training. We propose a "pseudo-shuffling" procedure that instead augments training sample unexpectedness by skipping successive samples. The second proposed technique is a novel "direct decoding" framework in which the posterior probability of the RNN is inputted into a decoder without conversion into a hidden Markov model emission probability. In our large vocabulary speech recognition experiments with English lecture recordings, the first technique significantly improved RNN training efficiency, showing a 14.3% relative word error rate (WER) improvement. The second technique further achieved an additional 3.1% relative WER improvement. Our sigmoid-type RNN achieved a 10.7% better WER than same-sized deep neural networks without using long short-term memory cells.
Keywords :
"Training","Hidden Markov models","Decoding","Recurrent neural networks","Acoustics","Speech recognition"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404768
Filename :
7404768
Link To Document :
بازگشت