• DocumentCode
    900341
  • Title

    Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

  • Author

    Mesgarani, Nima ; Slaney, Malcolm ; Shamma, Shihab A.

  • Author_Institution
    Electr. & Comput. Eng. Dept., Univ. of Maryland, College Park, MD, USA
  • Volume
    14
  • Issue
    3
  • fYear
    2006
  • fDate
    5/1/2006 12:00:00 AM
  • Firstpage
    920
  • Lastpage
    930
  • Abstract
    We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.
  • Keywords
    audio signal processing; modulation; speech processing; support vector machines; SVM; auditory cortical processing; content-based audio classification; multidimensional spectro-temporal representation; multilinear dimensionality reduction technique; multiscale spectro-temporal modulations; nonspeech; speech discrimination; support vector machine; Acoustic noise; Animals; Classification algorithms; Humans; Music; Reverberation; Speech; Support vector machine classification; Support vector machines; Working environment noise; Audio classification and segmentation; auditory model; speech discrimination;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.858055
  • Filename
    1621204