Recognizing emotion from singing and speaking using shared models

Author

Biqiao Zhang;Georg Essl;Emily Mower Provost

Author_Institution

University of Michigan, Ann Arbor, Ann Arbor, Michigan, USA

fYear

2015

Firstpage

139

Lastpage

145

Abstract

Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we propose three shared emotion recognition models for speech and song: a simple model, a single-task hierarchical model, and a multi-task hierarchical model. We study the commonalities and differences present in emotion expression across these two communication domains. We compare the performance across different settings, investigate the relationship between evaluator agreement rate and classification accuracy, and analyze the classification performance of individual feature groups. Our results show that the multi-task model classifies emotion more accurately compared to single-task models when the same set of features is used. This suggests that although spoken and sung emotion recognition tasks are different, they are related, and can be considered together. The results demonstrate that utterances with lower agreement rate and emotions with low activation benefit the most from multi-task learning. Visual features appear to be more similar across spoken and sung emotion expression, compared to acoustic features.

Keywords

"Speech","Emotion recognition","Speech recognition","Acoustics","Support vector machines","Feature extraction","Visualization"

Publisher

ieee

Conference_Titel

Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on

Electronic_ISBN

2156-8111

Type

conf

DOI

10.1109/ACII.2015.7344563

Filename

7344563