مرکز منطقه ای اطلاع رساني علوم و فناوري - Multiple windowed spectral features for emotion recognition

DocumentCode :

1691163

Title :

Multiple windowed spectral features for emotion recognition

Author :

Attabi, Yazid ; Alam, Mohammad Jahangir ; Dumouchel, P. ; Kenny, P. ; O´Shaughnessy, D.

Author_Institution :

Centre de Rech. Inf. de Montreal, Montréal, QC, Canada

fYear :

2013

Firstpage :

7527

Lastpage :

7531

Abstract :

MFCC (Mel Frequency Cepstral Coefficients) and PLP (Perceptual linear prediction coefficients) or RASTA-PLP have demonstrated good results whether when they are used in combination with prosodic features as suprasegmental (long-term) information or when used stand-alone as segmental (short-time) information. MFCC and PLP feature parameterization aims to represent the speech parameters in a way similar to how sound is perceived by humans. However, MFCC and PLP are usually computed from a Hamming-windowed periodogram spectrum estimate that is characterized by large variance. In this paper we study the effect of averaging spectral estimates obtained using a set of orthogonal tapers (windows) on emotion recognition performance. The multitaper MFCC and PLP are examined separately as short-time information vectors modeled using Gaussian mixture models (GMMs). When tested on the FAU AIBO spontaneous emotion corpus, a relative improvement ranging from 2.2% to 3.9% for both MFCC and PLP systems is achieved by multiple windowed spectral features compared to single windowed ones.

Keywords :

Gaussian processes; cepstral analysis; emotion recognition; estimation theory; feature extraction; prediction theory; FAU AIBO spontaneous emotion corpus; GMM; Gaussian mixture model; Hamming-windowed periodogram spectrum estimation; MFCC; RASTA-PLP; averaging spectral estimation effect; emotion recognition; feature extraction; mel frequency cepstral coefficient; multiple windowed spectral feature; orthogonal taper; perceptual linear prediction coefficient; prosodic feature; speech parameter representation; suprasegmental information; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Spectral analysis; Speech; Speech processing; Speech recognition; Emotion recognition; GMM; MFCC; PLP; multitaper spectrum; speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639126

Filename :

6639126

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1691163