Title :
Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation
Author :
Tsao, Yu ; Isotani, Ryosuke ; Kawai, Hisashi ; Nakamura, Satoshi
Author_Institution :
SLC Group, Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
Abstract :
In this study, we propose increasing discriminative power on the maximum a posteriori (MAP)-based mapping function estimation for acoustic model adaptation. Based on the effective and stable learning advantages of MAP-based estimation, we incorporate a discriminative term and derive a new objective function. By applying the new function for online mapping function estimation, we developed discriminative maximum a posteriori (DMAP) linear regression (DMAPLR) and DMAP-based ensemble speaker and speaking environment modeling (DMAP-based ESSEM). We evaluate the DMAPLR and DMAP-based ESSEM on the Aurora-2 task in a supervised adaptation mode. The experimental results show that both DMAPLR and DMAP-based ESSEM consistently provide improvements over their ML-based and MAP-based counterparts irrespective of using one, two, or three adaptation utterances. From the improvements, we confirm the strong effect of increasing discriminative capability on the MAP-based mapping function estimation. Moreover, we verify that including multiple knowledge sources in the objective function can efficiently enhance model adaptation performance. When compared with the baseline result DMAP-ESSEM achieves a 15.96% (9.21% to 7.74%) average word error rate (WER) reduction using only one adaptation utterance.
Keywords :
maximum likelihood estimation; regression analysis; speech recognition; DMAP-based ensemble speaker; MAP-based mapping function estimation; acoustic model adaptation; discriminative maximum a posteriori linear regression; maximum a posteriori-based mapping function estimation; objective function; supervised adaptation mode; word error rate; Acoustics; Adaptation models; Estimation; Hidden Markov models; Speech; Testing; Training; Automatic speech recognition; ESSEM; MAP-based ESSEM; MAPLR; MLLR; discriminative training;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947559