• DocumentCode
    103931
  • Title

    Robust tri-modal automatic speech recognition for consumer applications

  • Author

    Anderson, S.J. ; Fong, A.C.M. ; Jie Tang

  • Author_Institution
    Auckland Univ. of Technol., Auckland, New Zealand
  • Volume
    59
  • Issue
    2
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    352
  • Lastpage
    360
  • Abstract
    Commercial automatic speech recognition (ASR) started to appear in the late 1980¿s and can offer a more natural means of accepting user inputs than methods such as typing on keyboards or touch screens. This is a particularly important consideration for small consumer devices such as smartphones. In many practical situations, however, performance of ASR can be significantly compromised due to ambient noise and variable lighting conditions. Previous research has shown that adding visual cues to standard ASR can mitigate the effects of ambient noise. However, audiovisual (AV) ASR is not robust against variable lighting conditions, which are often encountered by users of consumer devices. Since thermal imaging is invariant to changing lighting conditions, the authors propose a trimodal thermal-audiovisual (TAV) ASR using adaptations of established techniques such as MT, DCT and MFCC. Experimental results demonstrate the robustness of this approach over a range of signal-to-noise ratios: tri-modal TAV recognition rates were +39.2% over audio-only ASR and +11.8% over AVASR recognition rates The authors believe that robust ASR will lead to improved user experiences.
  • Keywords
    speech recognition; ASR; TAV; ambient noise; consumer applications; consumer devices; keyboards; lighting conditions; robust trimodal automatic speech recognition; smartphones; thermal imaging; touch screens; trimodal thermal-audiovisual; variable lighting conditions; Band-pass filters; Lighting; Noise; Speech; Standards; Videos; Visualization; Speech recognition; audiovisual processing.; environment adaptation; speaker adaptation; voice control for consumer devices;
  • fLanguage
    English
  • Journal_Title
    Consumer Electronics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-3063
  • Type

    jour

  • DOI
    10.1109/TCE.2013.6531117
  • Filename
    6531117