• DocumentCode
    1798164
  • Title

    Transfer learning emotion manifestation across music and speech

  • Author

    Coutinho, Eduardo ; Jun Deng ; Schuller, Bjorn

  • Author_Institution
    Inst. for Human-Machine Commun., Tech. Univ. Munchen, München, Germany
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    3592
  • Lastpage
    3598
  • Abstract
    In this article, we focus on time-continuous predictions of emotion in music and speech, and the transfer of learning from one domain to the other. First, we compare the use of Recurrent Neural Networks (RNN) with standard hidden units (Simple Recurrent Network - SRN) and Long-Short Term Memory (LSTM) blocks for intra-domain acoustic emotion recognition. We show that LSTM networks outperform SRN, and we explain, in average, 74%/59% (music) and 42%/29% (speech) of the variance in Arousal/Valence. Next, we evaluate whether cross-domain predictions of emotion are a viable option for acoustic emotion recognition, and we test the use of Transfer Learning (TL) for feature space adaptation. In average, our models are able to explain 70%/43% (music) and 28%/ll% (speech) of the variance in Arousal/Valence. Overall, results indicate a good cross-domain generalization performance, particularly for the model trained on speech and tested on music without pre-encoding of the input features. To our best knowledge, this is the first demonstration of cross-modal time-continuous predictions of emotion in the acoustic domain.
  • Keywords
    emotion recognition; generalisation (artificial intelligence); learning (artificial intelligence); music; recurrent neural nets; speech recognition; LSTM blocks; LSTM networks; RNN; SRN; acoustic domain; arousal; cross-domain generalization performance; cross-domain predictions; feature space adaptation; intradomain acoustic emotion recognition; long-short term memory blocks; music; recurrent neural networks; simple recurrent network; standard hidden units; transfer learning emotion manifestation; valence; Adaptation models; Emotion recognition; Music; Predictive models; Speech; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889814
  • Filename
    6889814