مرکز منطقه ای اطلاع رساني علوم و فناوري - Hybrid speech recognition with Deep Bidirectional LSTM

DocumentCode :

672365

Title :

Hybrid speech recognition with Deep Bidirectional LSTM

Author :

Graves, Alan ; Jaitly, Navdeep ; Mohamed, Abdel-rahman

Author_Institution :

Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

273

Lastpage :

278

Abstract :

Deep Bidirectional LSTM (DBLSTM) recurrent neural networks have recently been shown to give state-of-the-art performance on the TIMIT speech database. However, the results in that work relied on recurrent-neural-network-specific objective functions, which are difficult to integrate with existing large vocabulary speech recognition systems. This paper investigates the use of DBLSTM as an acoustic model in a standard neural network-HMM hybrid system. We find that a DBLSTM-HMM hybrid gives equally good results on TIMIT as the previous work. It also outperforms both GMM and deep network benchmarks on a subset of the Wall Street Journal corpus. However the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy. We conclude that the hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates. Further investigation needs to be conducted to understand how to better leverage the improvements in frame-level accuracy towards better word error rates.

Keywords :

Gaussian processes; acoustic signal processing; error statistics; recurrent neural nets; speech recognition; DBLSTM-HMM hybrid; GMM; TIMIT speech database; Wall Street Journal corpus; acoustic modelling; deep bidirectional LSTM; deep network benchmarks; frame-level accuracy; framelevel accuracy; hybrid speech recognition; neural network-HMM hybrid system; recurrent neural networks; recurrent-neural-network-specific objective functions; vocabulary speech recognition systems; word error rate; Acoustics; Context; Hidden Markov models; Noise; Speech recognition; Training; Vectors; DBLSTM; HMM-RNN hybrid;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707742

Filename :

6707742

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672365