DocumentCode :
3744843
Title :
LSTM time and frequency recurrence for automatic speech recognition
Author :
Jinyu Li;Abdelrahman Mohamed;Geoffrey Zweig;Yifan Gong
Author_Institution :
Microsoft Corporation, One Microsoft Way, Redmond, WA 98052
fYear :
2015
Firstpage :
187
Lastpage :
191
Abstract :
Long short-term memory (LSTM) recurrent neural networks (RNNs) have recently shown significant performance improvements over deep feed-forward neural networks (DNNs). A key aspect of these models is the use of time recurrence, combined with a gating architecture that ameliorates the vanishing gradient problem. Inspired by human spectrogram reading, in this paper we propose an extension to LSTMs that performs the recurrence in frequency as well as in time. This model first scans the frequency bands to generate a summary of the spectral information, and then uses the output layer activations as the input to a traditional time LSTM (T-LSTM). Evaluated on a Microsoft short message dictation task, the proposed model obtained a 3.6% relative word error rate reduction over the T-LSTM.
Keywords :
"Time-frequency analysis","Logic gates","Stacking","Hidden Markov models","Recurrent neural networks","Spectrogram","Speech"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404793
Filename :
7404793
Link To Document :
بازگشت