DocumentCode
1112300
Title
Dynamic speaker adaptation for feature-based isolated word recognition
Author
Stern, Richard M. ; Lasry, Moshé J.
Author_Institution
Carnegie-Mellon University of Pittsburgh, PA
Volume
35
Issue
6
fYear
1987
fDate
6/1/1987 12:00:00 AM
Firstpage
751
Lastpage
763
Abstract
In this paper, we describe efforts to improve the performance of FEATURE, the Carnegie-Mellon University speaker-independent speech recognition system that classifies isolated letters of the English alphabet by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker-independent, it is frequently observed that feature values may vary more from speaker to speaker for a single letter than they vary from letter to letter. In these cases, it is necessary to adjust the system´s statistical description of the features of individual speakers to obtain improved recognition performance. This paper describes a set of dynamic adaptation procedures for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis. The MAP estimation algorithm makes use of both knowledge of the observations input to the system from an individual speaker and the relative variability of the features´ means within and across all speakers. In addition, knowledge of the covariance of the features´ mean vectors across the various letters enables the system to adapt its representation of similar-sounding letters after any one of them is presented to the classifier. The use of dynamic speaker adaptation improves classification performance of FEATURE by 49 percent after four presentations of the alphabet, when the system is provided with supervised training indicating which specific utterance had been presented to the classifier from a particular user. Performance can be improved by as much as 31 percent when the system is allowed to adapt passively in an unsupervised learning mode. without any information from individual users.
Keywords
Aerospace electronics; Bayesian methods; Computer science; Government; Humans; Loudspeakers; Monitoring; Speech recognition; US Department of Defense; Unsupervised learning;
fLanguage
English
Journal_Title
Acoustics, Speech and Signal Processing, IEEE Transactions on
Publisher
ieee
ISSN
0096-3518
Type
jour
DOI
10.1109/TASSP.1987.1165203
Filename
1165203
Link To Document