مرکز منطقه ای اطلاع رساني علوم و فناوري - Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling

DocumentCode :

3744818

Title :

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling

Author :

Naoyuki Kanda;Mitsuyoshi Tachimori;Xugang Lu;Hisashi Kawai

Author_Institution :

National Institute of Information and Communications Technology, Japan

fYear :

2015

Firstpage :

Lastpage :

Abstract :

We propose two techniques to enhance the performance of recurrent neural network (RNN)-based acoustic models. The first technique addresses training efficiency. Because RNNs require sequential input, it is difficult to randomly shuffle training samples to accelerate stochastic gradient descent based training. We propose a "pseudo-shuffling" procedure that instead augments training sample unexpectedness by skipping successive samples. The second proposed technique is a novel "direct decoding" framework in which the posterior probability of the RNN is inputted into a decoder without conversion into a hidden Markov model emission probability. In our large vocabulary speech recognition experiments with English lecture recordings, the first technique significantly improved RNN training efficiency, showing a 14.3% relative word error rate (WER) improvement. The second technique further achieved an additional 3.1% relative WER improvement. Our sigmoid-type RNN achieved a 10.7% better WER than same-sized deep neural networks without using long short-term memory cells.

Keywords :

"Training","Hidden Markov models","Decoding","Recurrent neural networks","Acoustics","Speech recognition"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404768

Filename :

7404768

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744818