• DocumentCode
    763422
  • Title

    Mask estimation for missing data speech recognition based on statistics of binaural interaction

  • Author

    Harding, Sue ; Barker, Jon ; Brown, Guy J.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sheffield, UK
  • Volume
    14
  • Issue
    1
  • fYear
    2006
  • Firstpage
    58
  • Lastpage
    67
  • Abstract
    This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.
  • Keywords
    speech intelligibility; speech recognition; statistical distributions; binaural interaction statistics; computational auditory scene analysis system; interaural level differences; interaural time differences; mask estimation; missing data speech recognition; probability distributions; sound separation; spatial location; Acoustic noise; Auditory system; Automatic speech recognition; Ear; Humans; Image analysis; Noise robustness; Reverberation; Speech recognition; Statistics; Automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.860354
  • Filename
    1561264