Title :
Combining information from multi-stream features using deep neural network in speech recognition
Author :
Pan Zhou ; Lirong Dai ; Qingfeng Liu ; Hui Jiang
Author_Institution :
Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
The subject of the paper is the integration of multi-stream features in the framework of hybrid artificial neural network (ANN) - hidden Markov model (HMM). We investigate the use of log filter bank and MFCC features in multi-stream combination for phoneme recognition. An intermediate integration method is proposed to fuse the information from different sets of features. By exploiting deep learning algorithm to train the deep neural network (DNN), we explore different stream combination methods. Results of recognition experiments using DNN-HMM system on the TIMIT speech data show that the proposed approach is not only superior to the single best stream, which is relative 6.1% phone error rate (PER) reduction, but outperforms the other fusion strategies as well.
Keywords :
error statistics; feature extraction; filtering theory; hidden Markov models; neural nets; speech recognition; ANN-HMM; DNN training; DNN-HMM system; MFCC features; PER reduction; TIMIT speech data; deep learning algorithm; deep neural network; deep neural network train; fusion strategies; hybrid artificial neural network hidden Markov model; information fusion; intermediate integration method; log filter bank; multistream combination; multistream features; phone error rate reduction; phoneme recognition; single best stream; speech recognition; stream combination methods; DNN-HMM; deep learning; intermediate integration; multi-stream combination; phoneme recognition;
Conference_Titel :
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2196-9
DOI :
10.1109/ICoSP.2012.6491549