DocumentCode
2357866
Title
Multi-layer perceptron based speech activity detection for speaker verification
Author
Ganapathy, Sriram ; Rajan, Padmanabhan ; Hermansky, Hynek
Author_Institution
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
fYear
2011
fDate
16-19 Oct. 2011
Firstpage
321
Lastpage
324
Abstract
In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here.
Keywords
autoregressive processes; cepstral analysis; error statistics; least mean squares methods; maximum likelihood estimation; multilayer perceptrons; signal detection; speaker recognition; speech processing; MFCC; MLP; SAD; autoregressive modelling technique; cepstral mean subtraction; equal error rate; frequency domain linear prediction; minimum mean square envelope estimation; modulation spectral features; multilayer perceptron; phoneme posterior probability; speaker recognition evaluation; speaker verification; speech activity detection; speech phoneme probabilities; speech signal processing; temporal envelopes; temporal segments; Acoustics; Noise; Noise measurement; Speech; Speech processing; Speech recognition; Vectors; Frequency Domain Linear Prediction (FDLP); Speaker Verification; Speech Activity Detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on
Conference_Location
New Paltz, NY
ISSN
1931-1168
Print_ISBN
978-1-4577-0692-9
Electronic_ISBN
1931-1168
Type
conf
DOI
10.1109/ASPAA.2011.6082323
Filename
6082323
Link To Document