Optimal and suboptimal training strategies for automatic speech recognition in noise, and the effects of adaptation on performance

Author

Baker, Janet M. ; Pinto, David F.

Author_Institution

Dragon Systems, Inc., Newton, Massachusetts, U.S.A.

Volume

11

fYear

1986

fDate

31503

Firstpage

745

Lastpage

748

Abstract

The quality of operational speech recognition performance in the presence of variable ambient noise can be significantly affected by the conditions under which patterns are initially trained, as well as by subsequent modifications of these patterns through "adaptation". These effects are clearly evidenced in a series of experiments conducted using the Dragon Systems Speech Driver in conjunction with DragonLAB (an experimenter\´s workstation facility), commercially available with the IBM PC Voice Communications Option and Voice Recognition Tool Kit, running on an IBM PC/AT. For each of two discrete command/control vocabularies, "Menu" (24 words) and "DOS" (29 words), training and multiple test session recordings were made at each of 55 and 65 dB ambient noise levels with an inexpensive cassette tape-recorder microphone, and at 85 dB with a close-talking noise-cancelling microphone. The seven American English-speaking subjects (4 male, 3 female) included in this database, exhibit diverse voice qualities, dialects, speech recognition familiarity, etc. A total of over 20,000 training and test utterances were collected for this database. The results of this series of experiments demonstrates the effects on recognition performance of different training stes on the same test sets, and the effects of adaptation (in a supervised learning mode), both for speaker-dependent and cross-speaker modes. For all speakers and both vocabularies, the best speaker-depentent recognition is consistently obtained when training and test materials are recorded at the same noise level. Much of the deterioration of performance across different training and test conditions can be reduced by employing supervised adaptation during the course of the recognition tests themselves, with each new test token subsequently being used as additional training to adapt its model.

Keywords

Automatic speech recognition; Communication system control; Databases; Microphones; Noise level; Speech enhancement; Speech recognition; Testing; Vocabulary; Workstations;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.

Type

conf

DOI

10.1109/ICASSP.1986.1169210

Filename

1169210