Title :
Emotion detection in speech using deep networks
Author :
Amer, Moh R. ; Siddiquie, Behjat ; Richey, Colleen ; Divakaran, Ajay
Author_Institution :
SRI Int., Princeton, NJ, USA
Abstract :
We propose a novel staged hybrid model for emotion detection in speech. Hybrid models exploit the strength of discriminative classifiers along with the representational power of generative models. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative representations. Our proposed hybrid model consists of a generative model, which is used for unsupervised representation learning of short term temporal phenomena and a discriminative model, which is used for event detection and classification of long range temporal dynamics. We evaluate our approach on multiple audio-visual datasets (AVEC, VAM, and SPD) and demonstrate its superiority compared to the state-of-the-art.
Keywords :
Boltzmann machines; emotion recognition; image classification; object detection; speech recognition; unsupervised learning; deep networks; discriminative classifiers; emotion detection; event detection; generative likelihood-based classifiers; generative models; human speech; long range temporal dynamic classification; multiple audio-visual datasets; restricted Boltzmann machines; short term temporal phenomena; unsupervised representation learning; Emotion recognition; Feature extraction; Hidden Markov models; Hybrid power systems; Speech; Speech recognition; Vectors; CRBMs; CRF; Deep Networks; Emotion Recognition; Hybrid Models;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854297