• DocumentCode
    2357866
  • Title

    Multi-layer perceptron based speech activity detection for speaker verification

  • Author

    Ganapathy, Sriram ; Rajan, Padmanabhan ; Hermansky, Hynek

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
  • fYear
    2011
  • fDate
    16-19 Oct. 2011
  • Firstpage
    321
  • Lastpage
    324
  • Abstract
    In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here.
  • Keywords
    autoregressive processes; cepstral analysis; error statistics; least mean squares methods; maximum likelihood estimation; multilayer perceptrons; signal detection; speaker recognition; speech processing; MFCC; MLP; SAD; autoregressive modelling technique; cepstral mean subtraction; equal error rate; frequency domain linear prediction; minimum mean square envelope estimation; modulation spectral features; multilayer perceptron; phoneme posterior probability; speaker recognition evaluation; speaker verification; speech activity detection; speech phoneme probabilities; speech signal processing; temporal envelopes; temporal segments; Acoustics; Noise; Noise measurement; Speech; Speech processing; Speech recognition; Vectors; Frequency Domain Linear Prediction (FDLP); Speaker Verification; Speech Activity Detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on
  • Conference_Location
    New Paltz, NY
  • ISSN
    1931-1168
  • Print_ISBN
    978-1-4577-0692-9
  • Electronic_ISBN
    1931-1168
  • Type

    conf

  • DOI
    10.1109/ASPAA.2011.6082323
  • Filename
    6082323