Title :
Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
Author :
Tsao, Yu ; Lee, Chin-Hui
Author_Institution :
Georgia Inst. of Technol., Atlanta
Abstract :
Recently an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing unknown testing environments was studied for robust speech recognition. Each environment is modeled by a super-vector consisting of the entire set of mean vectors from all Gaussian densities of a set of HMMs for a particular environment. The super-vector for a new testing environment is then obtained by an affine transformation on the ensemble super-vectors. In this paper, we propose a minimum classification error training procedure to obtain discriminative ensemble elements, and a super-vector clustering technique to achieve refined ensemble structures. We test these two extentions to ESSEM on Aurora2. In a per-utterance unsupervised adaptation mode we achieved an average WER of 4.99% from OdB to 20 dB conditions with these two extentions when compared with a 5.51% WER obtained with the ML-trained gender-dependent baseline. To our knowledge this represents the best result reported in the literature on the Aurora2 connected digit recognition task.
Keywords :
Gaussian processes; hidden Markov models; minimisation; pattern clustering; signal classification; speaker recognition; vectors; Gaussian density; affine transformation; automatic speech recognition; ensemble speaker-speaking environment modeling; ensemble supervector clustering technique; hidden Markov model; minimum classification error training procedure; Acoustic distortion; Acoustic testing; Automatic speech recognition; Automatic testing; Electronic switching systems; Hidden Markov models; Noise robustness; Phase distortion; System testing; Working environment noise; environment modeling; noise robustness;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430087