Title :
Improving deep neural networks using state projection vectors of subspace Gaussian mixture model as features
Author :
Murali Karthick, B. ; Umesh, S.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol. - Madras, Chennai, India
Abstract :
Recent advancement in deep neural network (DNN) has surpassed the conventional hidden Markov model-Gaussian mixture model (HMM-GMM) framework due to its efficient training procedure. Providing better phonetic context information in the input gives improved performance for DNN. The state projection vectors (state specific vectors) in subspace Gaussian mixture model (SGMM) captures the phonetic information in low dimensional vector space. In this paper, we propose to use state specific vectors of SGMM as features thereby providing additional phonetic information for the DNN framework. To each observation vector in the train data, the corresponding state specific vectors of SGMM are aligned to form the state specific vector feature set. Linear discriminant analysis (LDA) feature set are formed by applying LDA to the training data. Since bottleneck features are efficient in extracting useful discriminative information for the phonemes, LDA feature set and state specific vector feature set are converted to bottleneck features. These bottleneck features of both feature sets act as input features to train a single DNN framework. Relative improvement of 8.8% for TIMIT database (core test set) and 9.7% for WSJ corpus is obtained by using the state specific vector bottleneck feature set when compared to the DNN trained only with LDA bottleneck feature set. Also training Deep belief network - DNN (DBN-DNN) using the proposed feature set attains a WER of 20.46% on TIMIT core test set proving the effectiveness of our method. The state specific vectors while acting as features, provide additional useful information related to phoneme variation. Thus by combining it with LDA bottleneck features improved performance is obtained using the DNN framework.
Keywords :
Gaussian processes; hidden Markov models; learning (artificial intelligence); mixture models; neural nets; speech processing; statistical analysis; DBN-DNN; HMM-GMM framework; LDA feature set; SGMM; TIMIT database; WSJ corpus; deep belief network training; deep neural networks; discriminative information extraction; hidden Markov model-Gaussian mixture model; linear discriminant analysis; low dimensional vector space; phoneme variation; phonetic context information; state projection vectors; state specific vector bottleneck feature set; subspace Gaussian mixture model; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Speech; Training; Vectors; Deep neural network; SGMM; bottleneck features; state specific vectors;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078562