DocumentCode
241399
Title
Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition
Author
Stolar, Melissa N. ; Lech, Margaret ; Burnett, Ian S.
Author_Institution
Sch. of Electr. & Comput. Eng., RMIT Univ., Melbourne, VIC, Australia
fYear
2014
fDate
15-17 Dec. 2014
Firstpage
1
Lastpage
6
Abstract
This study investigates the effectiveness of speech emotion recognition using a new approach called the Optimized Multi-Channel Deep Neural Network (OMC-DNN), The proposed method has been tested with input features given as simple 2D black and white images representing graphs of the MFCC coefficients or the TEO parameters calculated either from speech (MFCC-S, TEO-S) or glottal waveforms (MFCC-G, TEO-G). A comparison with 6 different single-channel benchmark classifiers has shown that the OMC-DNN provided the best performance in both pair-wise (emotion vs. neutral) and simultaneous multiclass recognition of 7 emotions (anger, boredom, disgust, happiness, fear, sadness and neutral). In the pair-wise case, the OMC-DNN outperformed the single-channel DNN by 5%-10% depending on the feature set. In the multiclass case, the OMC-DNN outperformed or matched the singlechannel equivalents for all features. The speech spectrum and the glottal energy characteristics were identified as two important factors in discriminating between different types of categorical emotions in speech.
Keywords
acoustic signal processing; emotion recognition; neural nets; speech processing; 2D black images; 2D graphical representation; MFCC-G; MFCC-S; OMC-DNN; TEO-G; TEO-S; acoustic speech features; categorical emotions; glottal energy characteristics; optimized multichannel deep neural network; single-channel DNN; single-channel benchmark classifiers; speech emotion recognition; speech spectrum; white images; Accuracy; Artificial neural networks; Benchmark testing; Emotion recognition; Speech; Speech recognition; 2D features; deep neural network; emotion recognition; multichannel speech classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing and Communication Systems (ICSPCS), 2014 8th International Conference on
Conference_Location
Gold Coast, QLD
Type
conf
DOI
10.1109/ICSPCS.2014.7021120
Filename
7021120
Link To Document