Multi-layer perceptron based speech activity detection for speaker verification

Author

Ganapathy, Sriram ; Rajan, Padmanabhan ; Hermansky, Hynek

Author_Institution

Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA

fYear

2011

fDate

16-19 Oct. 2011

Firstpage

321

Lastpage

324

Abstract

In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here.

Keywords

autoregressive processes; cepstral analysis; error statistics; least mean squares methods; maximum likelihood estimation; multilayer perceptrons; signal detection; speaker recognition; speech processing; MFCC; MLP; SAD; autoregressive modelling technique; cepstral mean subtraction; equal error rate; frequency domain linear prediction; minimum mean square envelope estimation; modulation spectral features; multilayer perceptron; phoneme posterior probability; speaker recognition evaluation; speaker verification; speech activity detection; speech phoneme probabilities; speech signal processing; temporal envelopes; temporal segments; Acoustics; Noise; Noise measurement; Speech; Speech processing; Speech recognition; Vectors; Frequency Domain Linear Prediction (FDLP); Speaker Verification; Speech Activity Detection;

fLanguage

English

Publisher

ieee

Conference_Titel

Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on

Conference_Location

New Paltz, NY

ISSN

1931-1168

Print_ISBN

978-1-4577-0692-9

Electronic_ISBN

1931-1168

Type

conf

DOI

10.1109/ASPAA.2011.6082323

Filename

6082323