مرکز منطقه ای اطلاع رساني علوم و فناوري - Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems

DocumentCode :

1843536

Title :

Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems

Author :

Yebo Bao ; Hui Jiang ; Cong Liu ; Yu Hu ; Lirong Dai

Author_Institution :

Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China

Volume :

fYear :

2012

fDate :

21-25 Oct. 2012

Firstpage :

562

Lastpage :

566

Abstract :

The hybrid model, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs), has received significant improvements on various challenging large vocabulary continuous speech recognition (LVCSR) tasks just in these few years. Recently, it is further reported that gains of DNN are almost entirely attributed to using features concatenated from consecutive speech frames as DNN´s inputs. This result indicates that DNN has the excellent ability of well mining the high-dimensional features. But for GMM, we must resort to dimensionality reduction techniques to avoid the “curse of high-dimensionality”. In this paper, we attempt to derive compact and informative low-dimensional representations from concatenated features for GMM. Most simply, PCA is first considered about, but it doesn´t work well in this situation. Then, we focus on investigating DNN-based bottleneck features. The experiments on a Mandarin LVCSR task and the Switchboard task both show that the recognition performance of GMM-HMMs trained with bottleneck features (BN-GMM-HMMs) can be comparable to that of CD-DNN-HMMs. Moreover, when discriminative training is leveraged, surprisingly it is observed that BN-GMM-HMMs provides nearly 8% relative error reductions over CD-DNN-HMMs on the Mandarin LVCSR task.

Keywords :

Gaussian processes; data mining; hidden Markov models; neural nets; principal component analysis; speech recognition; BN-GMM-HMM; CD-DNN-HMM; GMM; LVCSR system; Mandarin LVCSR task; PCA; Switchboard task; bottleneck features; concatenated features; consecutive speech frames; context-dependent deep neural network; dimensionality reduction; hidden Markov models; high-dimensional feature mining; informative low-dimensional representations; large vocabulary continuous speech recognition; recognition performance; bottleneck features; deep neural networks; dimensionality reduction; large vocabulary continuous speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing (ICSP), 2012 IEEE 11th International Conference on

Conference_Location :

Beijing

ISSN :

2164-5221

Print_ISBN :

978-1-4673-2196-9

Type :

conf

DOI :

10.1109/ICoSP.2012.6491550

Filename :

6491550

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1843536