• DocumentCode
    3008758
  • Title

    Optimal and suboptimal training strategies for automatic speech recognition in noise, and the effects of adaptation on performance

  • Author

    Baker, Janet M. ; Pinto, David F.

  • Author_Institution
    Dragon Systems, Inc., Newton, Massachusetts, U.S.A.
  • Volume
    11
  • fYear
    1986
  • fDate
    31503
  • Firstpage
    745
  • Lastpage
    748
  • Abstract
    The quality of operational speech recognition performance in the presence of variable ambient noise can be significantly affected by the conditions under which patterns are initially trained, as well as by subsequent modifications of these patterns through "adaptation". These effects are clearly evidenced in a series of experiments conducted using the Dragon Systems Speech Driver in conjunction with DragonLAB (an experimenter\´s workstation facility), commercially available with the IBM PC Voice Communications Option and Voice Recognition Tool Kit, running on an IBM PC/AT. For each of two discrete command/control vocabularies, "Menu" (24 words) and "DOS" (29 words), training and multiple test session recordings were made at each of 55 and 65 dB ambient noise levels with an inexpensive cassette tape-recorder microphone, and at 85 dB with a close-talking noise-cancelling microphone. The seven American English-speaking subjects (4 male, 3 female) included in this database, exhibit diverse voice qualities, dialects, speech recognition familiarity, etc. A total of over 20,000 training and test utterances were collected for this database. The results of this series of experiments demonstrates the effects on recognition performance of different training stes on the same test sets, and the effects of adaptation (in a supervised learning mode), both for speaker-dependent and cross-speaker modes. For all speakers and both vocabularies, the best speaker-depentent recognition is consistently obtained when training and test materials are recorded at the same noise level. Much of the deterioration of performance across different training and test conditions can be reduced by employing supervised adaptation during the course of the recognition tests themselves, with each new test token subsequently being used as additional training to adapt its model.
  • Keywords
    Automatic speech recognition; Communication system control; Databases; Microphones; Noise level; Speech enhancement; Speech recognition; Testing; Vocabulary; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.
  • Type

    conf

  • DOI
    10.1109/ICASSP.1986.1169210
  • Filename
    1169210