Title :
Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems
Author :
Yebo Bao ; Hui Jiang ; Cong Liu ; Yu Hu ; Lirong Dai
Author_Institution :
Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
The hybrid model, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs), has received significant improvements on various challenging large vocabulary continuous speech recognition (LVCSR) tasks just in these few years. Recently, it is further reported that gains of DNN are almost entirely attributed to using features concatenated from consecutive speech frames as DNN´s inputs. This result indicates that DNN has the excellent ability of well mining the high-dimensional features. But for GMM, we must resort to dimensionality reduction techniques to avoid the “curse of high-dimensionality”. In this paper, we attempt to derive compact and informative low-dimensional representations from concatenated features for GMM. Most simply, PCA is first considered about, but it doesn´t work well in this situation. Then, we focus on investigating DNN-based bottleneck features. The experiments on a Mandarin LVCSR task and the Switchboard task both show that the recognition performance of GMM-HMMs trained with bottleneck features (BN-GMM-HMMs) can be comparable to that of CD-DNN-HMMs. Moreover, when discriminative training is leveraged, surprisingly it is observed that BN-GMM-HMMs provides nearly 8% relative error reductions over CD-DNN-HMMs on the Mandarin LVCSR task.
Keywords :
Gaussian processes; data mining; hidden Markov models; neural nets; principal component analysis; speech recognition; BN-GMM-HMM; CD-DNN-HMM; GMM; LVCSR system; Mandarin LVCSR task; PCA; Switchboard task; bottleneck features; concatenated features; consecutive speech frames; context-dependent deep neural network; dimensionality reduction; hidden Markov models; high-dimensional feature mining; informative low-dimensional representations; large vocabulary continuous speech recognition; recognition performance; bottleneck features; deep neural networks; dimensionality reduction; large vocabulary continuous speech recognition;
Conference_Titel :
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2196-9
DOI :
10.1109/ICoSP.2012.6491550