Title of article

Speech Emotion Recognition Using Scalogram Based Deep Structure

Author/Authors

Aghajani, K. Department of Engineering and Technology - University of Mazandaran, Babolsar, Iran , Esmaili Paeen Afrakoti, I. Department of Engineering and Technology - University of Mazandaran, Babolsar, Iran

Pages

From page

285

To page

292

Abstract

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concatenated Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The CNN can be used to learn local salient features from speech signals, images, and videos. Moreover, the RNNs have been used in many sequential data processing tasks in order to learn long-term dependencies between the local features. A combination of these two gives us the advantage of the strengths of both networks. In the proposed method, CNN has been applied directly to a scalogram of speech signals. Then, the attention-mechanism-based RNN model was used to learn long-term temporal relationships of the learned features. Experiments on various data such as RAVDESS, SAVEE, and Emo-DB demonstrate the effectiveness of the proposed SER method.

Keywords

Continuous Wavelet Transform , Emotion Recognition , Convolutional Neural Network , Recurrent Network , Long-short Term Memory

Journal title

International Journal of Engineering

Serial Year

2020

Record number

2553526

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2553526