Dynamic speaker adaptation for feature-based isolated word recognition

Author

Stern, Richard M. ; Lasry, Moshé J.

Author_Institution

Carnegie-Mellon University of Pittsburgh, PA

Volume

35

Issue

6

fYear

1987

fDate

6/1/1987 12:00:00 AM

Firstpage

751

Lastpage

763

Abstract

In this paper, we describe efforts to improve the performance of FEATURE, the Carnegie-Mellon University speaker-independent speech recognition system that classifies isolated letters of the English alphabet by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker-independent, it is frequently observed that feature values may vary more from speaker to speaker for a single letter than they vary from letter to letter. In these cases, it is necessary to adjust the system´s statistical description of the features of individual speakers to obtain improved recognition performance. This paper describes a set of dynamic adaptation procedures for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis. The MAP estimation algorithm makes use of both knowledge of the observations input to the system from an individual speaker and the relative variability of the features´ means within and across all speakers. In addition, knowledge of the covariance of the features´ mean vectors across the various letters enables the system to adapt its representation of similar-sounding letters after any one of them is presented to the classifier. The use of dynamic speaker adaptation improves classification performance of FEATURE by 49 percent after four presentations of the alphabet, when the system is provided with supervised training indicating which specific utterance had been presented to the classifier from a particular user. Performance can be improved by as much as 31 percent when the system is allowed to adapt passively in an unsupervised learning mode. without any information from individual users.

Keywords

Aerospace electronics; Bayesian methods; Computer science; Government; Humans; Loudspeakers; Monitoring; Speech recognition; US Department of Defense; Unsupervised learning;

fLanguage

English

Journal_Title

Acoustics, Speech and Signal Processing, IEEE Transactions on

Publisher

ieee

ISSN

0096-3518

Type

jour

DOI

10.1109/TASSP.1987.1165203

Filename

1165203