• DocumentCode
    3736739
  • Title

    Towards real-time Speech Emotion Recognition using deep neural networks

  • Author

    H.M. Fayek;M. Lech;L. Cavedon

  • Author_Institution
    School of Electrical and Computer Engineering, RMIT University, Melbourne, Victoria 3001, Australia
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Most existing Speech Emotion Recognition (SER) systems rely on turn-wise processing, which aims at recognizing emotions from complete utterances and an overly-complicated pipeline marred by many preprocessing steps and hand-engineered features. To overcome both drawbacks, we propose a real-time SER system based on end-to-end deep learning. Namely, a Deep Neural Network (DNN) that recognizes emotions from a one second frame of raw speech spectrograms is presented and investigated. This is achievable due to a deep hierarchical architecture, data augmentation, and sensible regularization. Promising results are reported on two databases which are the eNTERFACE database and the Surrey Audio-Visual Expressed Emotion (SAVEE) database.
  • Keywords
    "Databases","Speech recognition","Emotion recognition","Speech","Training","Neurons","Neural networks"
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communication Systems (ICSPCS), 2015 9th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICSPCS.2015.7391796
  • Filename
    7391796