• DocumentCode
    652739
  • Title

    Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition

  • Author

    Longfei Li ; Yong Zhao ; Dongmei Jiang ; Yanning Zhang ; Fengna Wang ; Gonzalez, Ivan ; Valentin, Emmanuel ; Sahli, Hichem

  • Author_Institution
    VUB-NPU Joint AVSP Res. Lab., Northwestern Polytech. Univ., Xi´an, China
  • fYear
    2013
  • fDate
    2-5 Sept. 2013
  • Firstpage
    312
  • Lastpage
    317
  • Abstract
    Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE´05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE´05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.
  • Keywords
    Boltzmann machines; emotion recognition; hidden Markov models; speech recognition; unsupervised learning; Berlin database; DNN-HMM; GMM-HMM; Gaussian mixture model based HMM; RBM; acoustic models; discriminative pre-training; eNTERFACE 05 database; hybrid deep neural network-hidden Markov model based speech emotion recognition; multilayer perceptrons HMM; restricted Boltzmann Machine; unsupervised pre-training; Databases; Emotion recognition; Hidden Markov models; Neural networks; Speech; Speech recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on
  • Conference_Location
    Geneva
  • ISSN
    2156-8103
  • Type

    conf

  • DOI
    10.1109/ACII.2013.58
  • Filename
    6681449