مرکز منطقه ای اطلاع رساني علوم و فناوري - Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation

DocumentCode :

2179086

Title :

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation

Author :

Tsao, Yu ; Isotani, Ryosuke ; Kawai, Hisashi ; Nakamura, Satoshi

Author_Institution :

SLC Group, Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

5320

Lastpage :

5323

Abstract :

In this study, we propose increasing discriminative power on the maximum a posteriori (MAP)-based mapping function estimation for acoustic model adaptation. Based on the effective and stable learning advantages of MAP-based estimation, we incorporate a discriminative term and derive a new objective function. By applying the new function for online mapping function estimation, we developed discriminative maximum a posteriori (DMAP) linear regression (DMAPLR) and DMAP-based ensemble speaker and speaking environment modeling (DMAP-based ESSEM). We evaluate the DMAPLR and DMAP-based ESSEM on the Aurora-2 task in a supervised adaptation mode. The experimental results show that both DMAPLR and DMAP-based ESSEM consistently provide improvements over their ML-based and MAP-based counterparts irrespective of using one, two, or three adaptation utterances. From the improvements, we confirm the strong effect of increasing discriminative capability on the MAP-based mapping function estimation. Moreover, we verify that including multiple knowledge sources in the objective function can efficiently enhance model adaptation performance. When compared with the baseline result DMAP-ESSEM achieves a 15.96% (9.21% to 7.74%) average word error rate (WER) reduction using only one adaptation utterance.

Keywords :

maximum likelihood estimation; regression analysis; speech recognition; DMAP-based ensemble speaker; MAP-based mapping function estimation; acoustic model adaptation; discriminative maximum a posteriori linear regression; maximum a posteriori-based mapping function estimation; objective function; supervised adaptation mode; word error rate; Acoustics; Adaptation models; Estimation; Hidden Markov models; Speech; Testing; Training; Automatic speech recognition; ESSEM; MAP-based ESSEM; MAPLR; MLLR; discriminative training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947559

Filename :

5947559

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2179086