DocumentCode
2325503
Title
Fast speaker adaptation of artificial neural networks for automatic speech recognition
Author
Dupont, Stéphane ; Cheboub, Leila
Author_Institution
TCTS-MULTITEL, Faculte Polytech. de Mons, Belgium
Volume
3
fYear
2000
fDate
2000
Firstpage
1795
Abstract
This paper presents a fast speaker adaptation technique dedicated to automatic speech recognition systems using artificial neural networks (ANNs) for hidden Markov models (HMMs) state probability estimation. Speaker-adapted ANNs are first obtained from the training data using affine transformations in the feature space. Similarly to the “eigenvoice” approach, principal components analysis (PCA) is then applied to these transformation matrices. The first few eigenvectors represent a small-dimensional space which captures most of the inter-speaker variability of the training set. During operation, these eigenvectors can be used to constrain the optimization of the transformation matrices for the new speakers. This optimization is performed using steepest descent with gradients obtained using backpropagation through the speaker independent ANN. We have been using state-of-the-art hybrid HMM/ANN systems trained on the Phonebook database. Supervised adaptation experiments with different amounts of data show better performance of this new technique compared to standard linear regression in the feature space: with only 20 words of adaptation data, results show a 15% relative decrease of the word error rate
Keywords
backpropagation; eigenvalues and eigenfunctions; estimation theory; hidden Markov models; neural nets; optimisation; principal component analysis; probability; speech recognition; HMMs state probability estimation; Phonebook database; affine transformations; artificial neural networks; automatic speech recognition; backpropagation; eigenvectors; eigenvoice; fast speaker adaptation; gradients; hidden Markov models; inter-speaker variability; optimization; performance; principal components analysis; speaker independent ANN; steepest descent; training data; training set; transformation matrices; word error rate; Artificial neural networks; Automatic speech recognition; Backpropagation; Constraint optimization; Hidden Markov models; Linear regression; Principal component analysis; Spatial databases; State estimation; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.862102
Filename
862102
Link To Document