Title :
Multiple windowed spectral features for emotion recognition
Author :
Attabi, Yazid ; Alam, Mohammad Jahangir ; Dumouchel, P. ; Kenny, P. ; O´Shaughnessy, D.
Author_Institution :
Centre de Rech. Inf. de Montreal, Montréal, QC, Canada
Abstract :
MFCC (Mel Frequency Cepstral Coefficients) and PLP (Perceptual linear prediction coefficients) or RASTA-PLP have demonstrated good results whether when they are used in combination with prosodic features as suprasegmental (long-term) information or when used stand-alone as segmental (short-time) information. MFCC and PLP feature parameterization aims to represent the speech parameters in a way similar to how sound is perceived by humans. However, MFCC and PLP are usually computed from a Hamming-windowed periodogram spectrum estimate that is characterized by large variance. In this paper we study the effect of averaging spectral estimates obtained using a set of orthogonal tapers (windows) on emotion recognition performance. The multitaper MFCC and PLP are examined separately as short-time information vectors modeled using Gaussian mixture models (GMMs). When tested on the FAU AIBO spontaneous emotion corpus, a relative improvement ranging from 2.2% to 3.9% for both MFCC and PLP systems is achieved by multiple windowed spectral features compared to single windowed ones.
Keywords :
Gaussian processes; cepstral analysis; emotion recognition; estimation theory; feature extraction; prediction theory; FAU AIBO spontaneous emotion corpus; GMM; Gaussian mixture model; Hamming-windowed periodogram spectrum estimation; MFCC; RASTA-PLP; averaging spectral estimation effect; emotion recognition; feature extraction; mel frequency cepstral coefficient; multiple windowed spectral feature; orthogonal taper; perceptual linear prediction coefficient; prosodic feature; speech parameter representation; suprasegmental information; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Spectral analysis; Speech; Speech processing; Speech recognition; Emotion recognition; GMM; MFCC; PLP; multitaper spectrum; speech;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639126