DocumentCode :
3529545
Title :
Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions
Author :
Sanand, D.R. ; Rath, S.P. ; Umesh, S.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Kanpur
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4397
Lastpage :
4400
Abstract :
The performance of conventional VTLN for mis-matched train and test speaker conditions (e.g. adult-train child-test) does not approach the performance of matched speaker conditions (e.g. child-train child-test). In this paper, we investigate this problem and propose methods to reduce this gap in performance. We use our recently proposed linear transformation approach to VTLN, that also enables us to study the effect of Jacobian unlike conventional VTLN. The main advantage of transform-based VTLN over adaptation based approaches (like CMLLR), is that it does not require any matrix estimation. We argue that the degraded VTLN performance under mismatched speaker conditions is due to the significant frequency warping that is necessary for normalization which leads to a mis-match between the correlation in the feature components of the test data and the covariance structure of the trained/normalized model. We show that the use of a global de-correlating transform (MLLT) leads to improved VTLN performance. We finally show that using both Jacobian and MLLT together improves the VTLN performance for mis-matched cases with the performance approaching that of matched speaker conditions.
Keywords :
Jacobian matrices; covariance matrices; speech processing; Jacobian matrix; covariance structure; frequency warping; linear transformation approach; matrix estimation; vocal tract length normalization; Covariance matrix; Degradation; Frequency estimation; Hidden Markov models; Jacobian matrices; Loudspeakers; Maximum likelihood estimation; Speech recognition; Testing; Training data; Jacobian; Linear Transformation; MLLT; Speaker Normalization; VTLN;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960604
Filename :
4960604
Link To Document :
بازگشت