Title :
Phone sequence modeling with recurrent neural networks
Author :
Boulanger-Lewandowski, Nicolas ; Droppo, Jasha ; Seltzer, Mike ; Dong Yu
Author_Institution :
Dept. IRO Montreal, Univ. de Montreal, Montreal, QC, Canada
Abstract :
In this paper, we investigate phone sequence modeling with recurrent neural networks in the context of speech recognition. We introduce a hybrid architecture that combines a phonetic model with an arbitrary frame-level acoustic model and we propose efficient algorithms for training, decoding and sequence alignment. We evaluate the advantage of our phonetic model on the TIMIT and Switchboard-mini datasets in complementarity to a powerful context-dependent deep neural network (DNN) acoustic classifier and a higher-level 3-gram language model. Consistent improvements of 2-10% in phone accuracy and 3% in word error rate suggest that our approach can readily replace HMMs in current state-of-the-art systems.
Keywords :
acoustic signal processing; error statistics; hidden Markov models; recurrent neural nets; signal classification; speech recognition; DNN; HMM; Switchboard-mini datasets; TIMIT datasets; acoustic classifier; arbitrary frame-level acoustic model; context-dependent deep neural network; decoding; hidden Markov model; higher-level 3-gram language model; hybrid architecture; phone accuracy; phone sequence modeling; phonetic model; recurrent neural networks; sequence alignment; speech recognition; word error rate; Acoustics; Context modeling; Hidden Markov models; Mathematical model; Recurrent neural networks; Speech recognition; Training; Recurrent neural network; phonetic model; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854638