DocumentCode :
395204
Title :
Structural speaker adaptation using maximum a posteriori approach and a Gaussian distributions merging technique
Author :
Bellot, Olivier ; matrouf, driss driss ; Nocera, Pascal ; Linares, Georges ; Bonastre, Jean-Francois
Author_Institution :
Lab. d´´Informatique d´´Avignon, Avignon, France
Volume :
2
fYear :
2003
fDate :
6-10 April 2003
Abstract :
The aim of speaker adaptation techniques is to enhance speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. Recently, a technique based on a hierarchical structure and the maximum a posteriori criterion was proposed (SMAP) (Shinoda, K. and Lee, C.-H., Proc IEEE ICASSP, 1998). As in SMAP, we assume that the acoustic model parameters are organized in a tree containing all the Gaussian distributions. Each node in that tree represents a cluster of Gaussian distributions sharing a common affine transformation representing the mismatch between training and test conditions. To estimate this affine transformation, we propose a new technique based on merging Gaussians and the standard MAP adaptation. This new technique is very fast and allows a good unsupervised adaptation for both means and variances even with a small amount of adaptation data. This adaptation strategy has shown a significant performance improvement in a large vocabulary speech recognition task, alone and combined with the MLLR (maximum likelihood linear regression) adaptation.
Keywords :
Gaussian distribution; acoustic signal processing; maximum likelihood estimation; speech enhancement; speech recognition; trees (mathematics); Gaussian distributions merging; MLLR; affine transformation estimation; large vocabulary speech recognition task; maximum a posteriori approach; maximum likelihood linear regression; recognition accuracy; speaker adaptation; speaker-independent acoustic models; Acoustic testing; Error analysis; Gaussian distribution; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Merging; Speech recognition; Training data; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1202309
Filename :
1202309
Link To Document :
بازگشت