• DocumentCode
    2871979
  • Title

    Noisy audio feature enhancement using audio-visual speech data

  • Author

    Goeck, Roland ; Potamianos, Geraimos ; Net, Chalapathy

  • Author_Institution
    Computer Sciences Laboratory, Australian National University, Canberra ACT 0200, Australia
  • Volume
    2
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker´s face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by mean square error estimation of the clean audio features in a training stage. The performance of the enhanced audio features is evaluated on two ASR tasks: A connected digits task and speaker-independent, large-vocabulary, continuous speech recognition. In both cases and at sufficiently low signal-to-noise ratios (SNRs), ASR trained on the enhanced audio features significantly outperforms ASR trained on the noisy audio, achieving for example a 46% relative reduction in word error rate on the digits task at −3.5 dB SNR. However, the method fails to capture the full visual modality benefit to ASR, as demonstrated by its comparison to discriminant audio-visual feature fusion introduced in previous work.
  • Keywords
    Hidden Markov models; Noise measurement; Signal to noise ratio;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5745030
  • Filename
    5745030