DocumentCode :
112359
Title :
Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model
Author :
Siniscalchi, Sabato Marco ; Dong Yu ; Li Deng ; Chin-Hui Lee
Author_Institution :
Fac. of Archit. & Eng., Univ. of Enna Kore, Enna, Italy
Volume :
20
Issue :
3
fYear :
2013
fDate :
Mar-13
Firstpage :
201
Lastpage :
204
Abstract :
In recent years, there has been a renewed interest in the use of artificial neural networks (ANNs) for speech applications, and it seems that a new trend to move the speech technology forward has begun. Two main contributions have triggered such a new trend: 1) a major advance has been made in training the weights in deep neural networks (DNNs), and a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture has outperformed a conventional Gaussian mixture model hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system on a challenging business search dataset, and 2) it has been shown that phoneme classification can be boosted by using a hierarchical structure of multi-layer perceptrons (MLPs) trained to model long-span temporal patterns with beneficial effects on language recognition tasks. In this work, we combine these two lines of research and demonstrate that word recognition accuracy can be significantly enhanced by arranging DNNs in a hierarchical structure to model long-term energy trajectories. The proposed solution has been evaluated on the 5000-word Wall Street Journal task, resulting in consistent and significant improvements in both phone and word recognition accuracy rates. We have also analyzed the effects of various modeling choices on the system performance, and several architectural solutions have been compared.
Keywords :
hidden Markov models; multilayer perceptrons; speech recognition; ANN; DNN-HMM hybrid architecture; GMM-HMM automatic speech recognition system; Gaussian mixture model hidden Markov model ASR system; MLP; Wall Street Journal task; artificial neural networks; business search dataset; hierarchical structure; language recognition tasks; long-span temporal patterns; long-term energy trajectories; multilayer perceptrons; phone accuracy rate; phoneme classification; pre-trained deep neural network hidden Markov model; speech technology; word recognition accuracy; word recognition accuracy rate; Computational modeling; Data models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Automatic speech recognition; deep neural networks; large vocabulary continuous speech recognition;
fLanguage :
English
Journal_Title :
Signal Processing Letters, IEEE
Publisher :
ieee
ISSN :
1070-9908
Type :
jour
DOI :
10.1109/LSP.2013.2237901
Filename :
6403509
Link To Document :
بازگشت