DocumentCode :
1109722
Title :
An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition
Author :
Zhao, Yunxin
Author_Institution :
Speech Technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA
Volume :
2
Issue :
3
fYear :
1994
fDate :
7/1/1994 12:00:00 AM
Firstpage :
380
Lastpage :
394
Abstract :
A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%
Keywords :
acoustic signal processing; speech recognition; Gaussian mixture density based hidden Markov models; TIMIT database; acoustic normalization; acoustic-phonetic-based speaker adaptation technique; decomposition; linear transformation system; performance; phone model parameters; recognition word accuracy; speaker-independent continuous speech recognition; spectral variation sources; test set perplexity; vocabulary size; Calibration; Character recognition; Databases; Decoding; Hidden Markov models; Loudspeakers; Speech recognition; System performance; System testing; Vocabulary;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.294352
Filename :
294352
Link To Document :
بازگشت