• DocumentCode
    241399
  • Title

    Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition

  • Author

    Stolar, Melissa N. ; Lech, Margaret ; Burnett, Ian S.

  • Author_Institution
    Sch. of Electr. & Comput. Eng., RMIT Univ., Melbourne, VIC, Australia
  • fYear
    2014
  • fDate
    15-17 Dec. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This study investigates the effectiveness of speech emotion recognition using a new approach called the Optimized Multi-Channel Deep Neural Network (OMC-DNN), The proposed method has been tested with input features given as simple 2D black and white images representing graphs of the MFCC coefficients or the TEO parameters calculated either from speech (MFCC-S, TEO-S) or glottal waveforms (MFCC-G, TEO-G). A comparison with 6 different single-channel benchmark classifiers has shown that the OMC-DNN provided the best performance in both pair-wise (emotion vs. neutral) and simultaneous multiclass recognition of 7 emotions (anger, boredom, disgust, happiness, fear, sadness and neutral). In the pair-wise case, the OMC-DNN outperformed the single-channel DNN by 5%-10% depending on the feature set. In the multiclass case, the OMC-DNN outperformed or matched the singlechannel equivalents for all features. The speech spectrum and the glottal energy characteristics were identified as two important factors in discriminating between different types of categorical emotions in speech.
  • Keywords
    acoustic signal processing; emotion recognition; neural nets; speech processing; 2D black images; 2D graphical representation; MFCC-G; MFCC-S; OMC-DNN; TEO-G; TEO-S; acoustic speech features; categorical emotions; glottal energy characteristics; optimized multichannel deep neural network; single-channel DNN; single-channel benchmark classifiers; speech emotion recognition; speech spectrum; white images; Accuracy; Artificial neural networks; Benchmark testing; Emotion recognition; Speech; Speech recognition; 2D features; deep neural network; emotion recognition; multichannel speech classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communication Systems (ICSPCS), 2014 8th International Conference on
  • Conference_Location
    Gold Coast, QLD
  • Type

    conf

  • DOI
    10.1109/ICSPCS.2014.7021120
  • Filename
    7021120