Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

Author

Mesgarani, Nima ; Slaney, Malcolm ; Shamma, Shihab A.

Author_Institution

Electr. & Comput. Eng. Dept., Univ. of Maryland, College Park, MD, USA

Volume

14

Issue

3

fYear

2006

fDate

5/1/2006 12:00:00 AM

Firstpage

920

Lastpage

930

Abstract

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.

Keywords

audio signal processing; modulation; speech processing; support vector machines; SVM; auditory cortical processing; content-based audio classification; multidimensional spectro-temporal representation; multilinear dimensionality reduction technique; multiscale spectro-temporal modulations; nonspeech; speech discrimination; support vector machine; Acoustic noise; Animals; Classification algorithms; Humans; Music; Reverberation; Speech; Support vector machine classification; Support vector machines; Working environment noise; Audio classification and segmentation; auditory model; speech discrimination;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TSA.2005.858055

Filename

1621204