DocumentCode :
2358009
Title :
Learning emotion-based acoustic features with deep belief networks
Author :
Schmidt, Erik M. ; Kim, Youngmoo E.
Author_Institution :
Music & Entertainment Technol. Lab. (MET-Lab.), Drexel Univ., Philadelphia, PA, USA
fYear :
2011
fDate :
16-19 Oct. 2011
Firstpage :
65
Lastpage :
68
Abstract :
The medium of music has evolved specifically for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task, and as such no dominant feature representation for music emotion recognition has yet emerged. Much of the difficulty in developing emotion-based features is the ambiguity of the ground-truth. Even using the smallest time window, opinions on the emotion are bound to vary and reflect some disagreement between listeners. In previous work, we have modeled human response labels to music in the arousal-valence (A-V) representation of affect as a time-varying, stochastic distribution. Current methods for automatic detection of emotion in music seek performance increases by combining several feature domains (e.g. loudness, timbre, harmony, rhythm). Such work has focused largely in dimensionality reduction for minor classification performance gains, but has provided little insight into the relationship between audio and emotional associations. In this new work we seek to employ regression-based deep belief networks to learn features directly from magnitude spectra. While the system is applied to the specific problem of music emotion recognition, it could be easily applied to any regression-based audio feature learning problem.
Keywords :
audio signal processing; belief networks; emotion recognition; music; acoustic features; audio feature learning problem; expression of emotions; learning emotion; magnitude spectra; music emotion recognition; regression based deep belief networks; Emotion recognition; Humans; Machine learning; Music; Topology; Training; Emotion recognition; deep belief networks; feature learning; regression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on
Conference_Location :
New Paltz, NY
ISSN :
1931-1168
Print_ISBN :
978-1-4577-0692-9
Electronic_ISBN :
1931-1168
Type :
conf
DOI :
10.1109/ASPAA.2011.6082328
Filename :
6082328
Link To Document :
بازگشت