A new speaker adaptation technique using very short calibration speech

Author

Zhao, Yunxin

Author_Institution

Panasonic Technologies Inc., Santa Barbara, CA, USA

Volume

2

fYear

1993

fDate

27-30 April 1993

Firstpage

562

Abstract

A speaker adaptation technique based on the separation of speech spectra variation sources is developed for improving speaker-independent continuous speech recognition. The variation sources include speaker acoustic characteristics, phonologic characteristics, and contextual dependency of allophones. Statistical methods are formulated to normalize speech spectra based on speaker acoustic characteristics and then adapt mixture Gaussian density phone models based on speaker phonologic characteristics. Adaptation experiments using short calibration speech (5 s/speaker) have shown substantial performance improvement over the baseline recognition system. On a TIMIT test set, where the task vocabulary size is 853 and the test set perplexity is 104, the recognition word accuracy has been improved from 86.9% to 90.6% (28.2% error reduction). On a separate test set which contains an additional variation source of recording channel mismatch and with the test set perplexity of 101, the recognition word accuracy has been improved from 65.4% to 85.5% (58.1% error reduction).<>

Keywords

adaptive systems; calibration; speech recognition; vocabulary; TIMIT test set; calibration speech; contextual dependency of allophones; mixture Gaussian density phone models; performance; perplexity; phonologic characteristics; recognition word accuracy; speaker acoustic characteristics; speaker adaptation technique; speaker-independent continuous speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on

Conference_Location

Minneapolis, MN, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.1993.319369

Filename

319369