DocumentCode :
2279376
Title :
Eliminating inter-speaker variability prior to discriminant transforms
Author :
Saon, George ; Padmanabhan, Mukund ; Gopinath, Ramesh
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2001
fDate :
2001
Firstpage :
73
Lastpage :
76
Abstract :
This paper shows the impact of speaker normalization techniques, such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT), prior to discriminant feature space transforms, such as LDA (linear discriminant analysis). We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved classification accuracy (as measured by the word error rate). Experimental results on the SPINE (speech in noisy environments) database indicate an improvement of up to 5% relative over the standard case where speaker adaptation (during testing and training) is applied after the LDA transform which is trained in a speaker independent manner. We conjecture that performing linear discriminant analysis in a canonical feature space (or speaker normalized space) is more effective than LDA in a speaker independent space because the eigenvectors carve a subspace of maximum intra-speaker phonetic separability whereas in the latter case this subspace is also defined by the inter-speaker variability. Indeed, we show that the more normalization is performed (first VTLN, then SAT), the higher the LDA eigenvalues become.
Keywords :
eigenvalues and eigenfunctions; error statistics; feature extraction; learning (artificial intelligence); pattern classification; speech recognition; SPINE database; canonical feature space; cepstral feature extraction; discriminant transforms; eigenvalues; eigenvectors; feature space transforms; inter-speaker variability; linear discriminant analysis; speaker adaptation; speaker normalization techniques; speaker-adaptive training; statistical pattern classification; vocal tract length normalization; word error rate; Auditory system; Cepstral analysis; Eigenvalues and eigenfunctions; Error analysis; Feature extraction; Humans; Linear discriminant analysis; Spatial databases; Speech recognition; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
Print_ISBN :
0-7803-7343-X
Type :
conf
DOI :
10.1109/ASRU.2001.1034592
Filename :
1034592
Link To Document :
بازگشت