Title of article
Speech Emotion Recognition Using Scalogram Based Deep Structure
Author/Authors
Aghajani, K. Department of Engineering and Technology - University of Mazandaran, Babolsar, Iran , Esmaili Paeen Afrakoti, I. Department of Engineering and Technology - University of Mazandaran, Babolsar, Iran
Pages
8
From page
285
To page
292
Abstract
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concatenated Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The CNN can be used to learn local salient features from speech signals, images, and videos. Moreover, the RNNs have been used in many sequential data processing tasks in order to learn long-term dependencies between the local features. A combination of these two gives us the advantage of the strengths of both networks. In the proposed method, CNN has been applied directly to a scalogram of speech signals. Then, the attention-mechanism-based RNN model was used to learn long-term temporal relationships of the learned features. Experiments on various data such as RAVDESS, SAVEE, and Emo-DB demonstrate the effectiveness of the proposed SER method.
Keywords
Continuous Wavelet Transform , Emotion Recognition , Convolutional Neural Network , Recurrent Network , Long-short Term Memory
Journal title
International Journal of Engineering
Serial Year
2020
Record number
2553526
Link To Document