DocumentCode :
179885
Title :
Deep neural network trained with speaker representation for speaker normalization
Author :
Yun Tang ; Mohan, Archith ; Rose, Richard C. ; Chengyuan Ma
Author_Institution :
Nuance Commun., Burlington, MA, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
6329
Lastpage :
6333
Abstract :
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM) based ASR decoder. While AE-BN features are known to provide significant reduction in ASR word error rate (WER) with respect to more conventional spectral magnitude based features, there is no general agreement on how these networks can reduce the impact of speaker variability by incorporating prior knowledge of the speaker. An approach is presented in this paper where spectrum based DNN inputs are augmented with speaker inputs that are derived from separate regression based speaker transformations. It is shown the proposed method could reduce the WER by 3% relative to the best speaker adapted AE-BN CDHMM system.
Keywords :
hidden Markov models; learning (artificial intelligence); neural nets; speaker recognition; AE-BN feature extraction; ASR decoder; ASR word error rate; DNN based discriminative feature estimation; GMM; HMM; WER; auto-encoder based low dimensional bottleneck; automatic speech recognition; continuous Gaussian density hidden Markov model; deep neural network; regression based speaker transformations; speaker normalization; speaker representation; spectral magnitude based features; Adaptation models; Feature extraction; Hidden Markov models; Speech; Training; Transforms; Vectors; Neural networks; speaker adaptation; speaker normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854822
Filename :
6854822
Link To Document :
بازگشت