DocumentCode :
259410
Title :
Efficient Recognition of Human Emotional States from Audio Signals
Author :
Erdem, Ernur Sonat ; Sert, Mustafa
Author_Institution :
Dept. of Comput. Eng., Baskent Univ., Ankara, Turkey
fYear :
2014
fDate :
10-12 Dec. 2014
Firstpage :
139
Lastpage :
142
Abstract :
Automatic recognition of human emotional states is an important task for efficient human-machine communication. Most of existing works focus on the recognition of emotional states using audio signals alone, visual signals alone, or both. Here we propose empirical methods for feature extraction and classifier optimization that consider the temporal aspects of audio signals and introduce our framework to efficiently recognize human emotional states from audio signals. The framework is based on the prediction of input audio clips that are described using representative low-level features. In the experiments, seven (7) discrete emotional states (anger, fear, boredom, disgust, happiness, sadness, and neutral) from EmoDB dataset, are recognized and tested based on nineteen (19) audio features (15 standalone, 4 joint) by using the Support Vector Machine (SVM) classifier. Extensive experiments have been conducted to demonstrate the effect of feature extraction and classifier optimization methods to the recognition accuracy of the emotional states. Our experiments show that, feature extraction and classifier optimization procedures lead to significant improvement of over 11% in emotion recognition. As a result, the overall recognition accuracy achieved for seven emotions in the EmoDB dataset is 83.33% compared to the baseline accuracy of 72.22%.
Keywords :
audio signal processing; emotion recognition; feature extraction; optimisation; signal classification; support vector machines; EmoDB dataset; SVM classifier; anger; audio features; audio signal temporal aspects; boredom; classifier optimization; discrete emotional states; disgust; empirical methods; fear; feature extraction; happiness; human emotional state recognition; human-machine communication; input audio clip prediction; neutral; representative low-level features; sadness; support vector machine classifier; visual signals; Accuracy; Emotion recognition; Feature extraction; Joints; Mel frequency cepstral coefficient; Support vector machines; Vectors; audio based emotion recognition; affective co;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia (ISM), 2014 IEEE International Symposium on
Conference_Location :
Taichung
Print_ISBN :
978-1-4799-4312-8
Type :
conf
DOI :
10.1109/ISM.2014.81
Filename :
7033010
Link To Document :
بازگشت