DocumentCode :
960130
Title :
Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space
Author :
Afify, Mohamed ; Siohan, Olivier
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights
Volume :
15
Issue :
5
fYear :
2007
fDate :
7/1/2007 12:00:00 AM
Firstpage :
1731
Lastpage :
1732
Abstract :
The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is stated in M. Pitz and H. Ney, (ldquoVocal tract length normalization equals linear transformation in cepstral space,rdquo IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp 930-944, September 2005) that the transform of Afify and Siohan was motivated by empirical observations.
Keywords :
cepstral analysis; maximum likelihood estimation; regression analysis; speaker recognition; speech processing; audio processing; band-diagonal transform; bilinear transformation; cepstral space; linear transformation; maximum likelihood linear regression; speaker adaptation; speech processing; speech recogniton system; vocal tract length normalization; Adaptation model; Cepstral analysis; Equations; Frequency; Linear regression; Maximum likelihood linear regression; Natural languages; Speech processing; Speech recognition; Transforms; Maximum-likelihood linear regression; speaker adaptation; speech recognition; vocal tract length normalization;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.896653
Filename :
4244505
Link To Document :
بازگشت