Title :
Speaker Normalisation for Speech-Based Emotion Detection
Author :
Sethu, Vidhyasaharan ; Ambikairajah, Eliathamby ; Epps, Julien
Author_Institution :
Univ. of New South Wales, Sydney
Abstract :
The focus of this paper is on speech-based emotion detection utilising only acoustic data, i.e. without using any linguistic or semantic information. However, this approach in general suffers from the fact that acoustic data is speaker-dependent, and can result in inefficient estimation of the statistics modelled by classifiers such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). We propose the use of speaker-specific feature warping as a means of normalising acoustic features to overcome the problem of speaker dependency. In this paper we compare the performance of a system that uses feature warping to one that does not. The back-end employs an HMM-based classifier that captures the temporal variations of the feature vectors by modelling them as transitions between different states. Evaluations conducted on the LDC emotional prosody speech corpus reveal a relative increase in classification accuracy of up to 20%.
Keywords :
Gaussian processes; emotion recognition; hidden Markov models; pattern classification; speaker recognition; speech processing; Gaussian mixture models; HMM-based classifier; hidden Markov models; speaker dependency; speaker normalisation; speaker-specific feature warping; speech-based emotion detection; Acoustic signal detection; Australia; Classification tree analysis; Emotion recognition; Hidden Markov models; Humans; Loudspeakers; Phase estimation; Speech; Testing; Feature warping; cumulative distribution mapping; emotion detection; hidden Markov model;
Conference_Titel :
Digital Signal Processing, 2007 15th International Conference on
Conference_Location :
Cardiff
Print_ISBN :
1-4244-0882-2
Electronic_ISBN :
1-4244-0882-2
DOI :
10.1109/ICDSP.2007.4288656