• DocumentCode
    1686083
  • Title

    Tri-modal speech recognition for noisy and variable lighting conditions

  • Author

    Anderson, S. ; Fong, Alex ; Jie Tang

  • Author_Institution
    Auckland Univ. of Technol., Auckland, New Zealand
  • fYear
    2013
  • Firstpage
    72
  • Lastpage
    73
  • Abstract
    Automatic speech recognition (ASR) has found widespread applications in consumer products. Often, ASR performance can be compromised in noisy environments. Previous research has shown that adding visual cues can improve the performance of ASR, particularly in noisy environments. However, audiovisual (AV) ASR is not robust against changing lighting conditions, which are often encountered by end users of consumer products. Since thermal imaging is highly invariant to changing lighting conditions, we propose a tri-modal ASR involving thermal imaging and audiovisual (TAV) data for consumer applications. Experimental results demonstrate the applicability of this approach over a range of signal-to-noise ratios: Tri-modal TAV recognition rates were +39.2% over audio-only and +11.8% over AV recognition rates.
  • Keywords
    acoustic noise; lighting; speech recognition; ASR performance; automatic speech recognition; consumer products; noisy conditions; thermal imaging; trimodal speech recognition; variable lighting conditions; visual cues; Hidden Markov models; Imaging; Lighting; Noise measurement; Speech recognition; Standards; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics (ICCE), 2013 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    2158-3994
  • Print_ISBN
    978-1-4673-1361-2
  • Type

    conf

  • DOI
    10.1109/ICCE.2013.6486800
  • Filename
    6486800