DocumentCode
3703306
Title
Recognizing emotion from singing and speaking using shared models
Author
Biqiao Zhang;Georg Essl;Emily Mower Provost
Author_Institution
University of Michigan, Ann Arbor, Ann Arbor, Michigan, USA
fYear
2015
Firstpage
139
Lastpage
145
Abstract
Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we propose three shared emotion recognition models for speech and song: a simple model, a single-task hierarchical model, and a multi-task hierarchical model. We study the commonalities and differences present in emotion expression across these two communication domains. We compare the performance across different settings, investigate the relationship between evaluator agreement rate and classification accuracy, and analyze the classification performance of individual feature groups. Our results show that the multi-task model classifies emotion more accurately compared to single-task models when the same set of features is used. This suggests that although spoken and sung emotion recognition tasks are different, they are related, and can be considered together. The results demonstrate that utterances with lower agreement rate and emotions with low activation benefit the most from multi-task learning. Visual features appear to be more similar across spoken and sung emotion expression, compared to acoustic features.
Keywords
"Speech","Emotion recognition","Speech recognition","Acoustics","Support vector machines","Feature extraction","Visualization"
Publisher
ieee
Conference_Titel
Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
Electronic_ISBN
2156-8111
Type
conf
DOI
10.1109/ACII.2015.7344563
Filename
7344563
Link To Document