• DocumentCode
    179012
  • Title

    Effective use of DCTS for contextualizing features for speaker recognition

  • Author

    McLaren, Moray ; Scheffer, Nicolas ; Ferrer, Luciana ; Yun Lei

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    4027
  • Lastpage
    4031
  • Abstract
    This article proposes a new approach for contextualizing features for speaker recognition through the discrete cosine transform (DCT). Specifically, we apply a 2D-DCT transform on the Mel filterbank outputs to replace the common Mel frequency cepstral coefficients (MFCCs) appended by deltas and double deltas. A thorough comparison of algorithms for delta computation and DCT-based contextualization for speaker recognition is provided and the effect of varying the size of analysis window in each case is considered. Selection of 2D-DCT coefficients using a zig-zag approach permits definition of an arbitrary feature dimension using the most energized coefficients. We show that 60 coefficients computed using our approach outperforms the standard MFCCs appended with double deltas by up to 25% relative on the NIST 2012 speaker recognition evaluation (SRE) corpus in both Cprimary and equal error rate (EER) while additional coefficients increase system robustness to noise.
  • Keywords
    channel bank filters; discrete cosine transforms; speaker recognition; 2D-DCT coefficient selection; 2D-DCT transform; DCT-based contextualization; EER; MFCCs; Mel filter bank outputs; Mel frequency cepstral coefficients; NIST 2012 speaker recognition evaluation corpus; SRE; analysis window size; arbitrary feature dimension; contextualizing features; discrete cosine transform; double deltas; equal error rate; most energized coefficients; speaker recognition; zig-zag approach; Discrete cosine transforms; Feature extraction; NIST; Noise measurement; Speaker recognition; Speech; Speech recognition; 2D-DCT; Contextualization; Deltas; Filterbank Energies; Speaker Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854358
  • Filename
    6854358