• DocumentCode
    3606744
  • Title

    A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields

  • Author

    Carlin, Michael A. ; Elhilali, Mounya

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    23
  • Issue
    12
  • fYear
    2015
  • Firstpage
    2422
  • Lastpage
    2433
  • Abstract
    One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectro-temporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
  • Keywords
    hearing; neurophysiology; speech processing; adaptive auditory receptive fields; auditory neurophysiology; automated speech processing systems; behavioral demands; biologically plausible adaptation framework; feature representations; noise robustness; noise-robust baseline; nonspeech sounds; sound processing; spectro-temporal modulations; spectro-temporal receptive fields; speech activity detection; stimulus reconstruction task; surrounding soundscapes; task-driven adaptation; unseen noisy environments; Adaptation models; Adaptive filtering; Brain models; Modulation; Nervous system; Neurons; Noise measurement; Speech processing; Adaptive filtering; neural plasticity; spectro-temporal receptive fields; speech activity detection (SAD); stimulus reconstruction;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2481179
  • Filename
    7274353