Title of article :
Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning
Author/Authors :
Mansouri ، Bibi Zahra Electrical and Computer Engineering Department - Islamic Azad University, Ferdows branch , Ghaffary ، Hamid Reza Electrical and Computer Engineering Department - Islamic Azad University, Ferdows branch , Harimi ، Ali Electrical and Computer Engineering Department - Islamic Azad University, Ferdows branch
Abstract :
Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in the SER systems. Deep neural networks could partially solve this problem in some other applications. In order to address this problem, we propose a novel enriched spectrogram calculated based on the fusion of wide-band and narrow-band spectrograms. The proposed spectrogram benefits from both high temporal and spectral resolution. Then we apply the resultant spectrogram images to the pre-trained deep convolutional neural network, ResNet152. Instead of the last layer of ResNet152, we add five additional layers to adopt the model to the present task. All the experiments performed on the popular EmoDB dataset are based on leaving one speaker out of a technique that guarantees the speaker s independence from the model. The model gains an accuracy rate of 88.97%, which shows the efficiency of the proposed approach in contrast to other state-of-the-art methods.
Keywords :
Wideband and Narrowband Spectrogram , ResNet152 , DCNN , Transfer Learning , Speech Emotion Recognition
Journal title :
Journal of Artificial Intelligence and Data Mining
Journal title :
Journal of Artificial Intelligence and Data Mining