مرکز منطقه ای اطلاع رساني علوم و فناوري - An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

DocumentCode :

1109722

Title :

An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

Author :

Zhao, Yunxin

Author_Institution :

Speech Technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA

Volume :

Issue :

fYear :

1994

fDate :

7/1/1994 12:00:00 AM

Firstpage :

380

Lastpage :

394

Abstract :

A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%

Keywords :

acoustic signal processing; speech recognition; Gaussian mixture density based hidden Markov models; TIMIT database; acoustic normalization; acoustic-phonetic-based speaker adaptation technique; decomposition; linear transformation system; performance; phone model parameters; recognition word accuracy; speaker-independent continuous speech recognition; spectral variation sources; test set perplexity; vocabulary size; Calibration; Character recognition; Databases; Decoding; Hidden Markov models; Loudspeakers; Speech recognition; System performance; System testing; Vocabulary;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.294352

Filename :

294352

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1109722