Abstract :
In this paper, we study how speech features´ numbers and statistical values impact recognition accuracy of emotions present in speech. With Gaussian Mixture Model (GMM), we identify two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) and Auto Correlation Function Coefficients (ACFC) extracted directly from speech signal. Using GMM supervector formed by values of MFCCs, delta MFCCs and ACFC, we conduct experiments with Berlin emotional database considering six previously proposed emotions: anger, disgust, fear, happy, neutral and sad. Our method achieve emotion recognition rate of 74.45%, significantly better than 59.00% achieved previously. To prove the broad applicability of our method, we also conduct experiments considering a different set of emotions: anger, boredom, fear, happy, neutral and sad. Our emotion recognition rate of 75.00% is again better than71.00% of the method of hidden Markov model with MFCC, delta MFCC, cepstral coefficient and speech energy.
Keywords :
Gaussian processes; emotion recognition; hidden Markov models; speech recognition; ACFC; Berlin emotional database; GMM supervector; Gaussian mixture model; Mel frequency cepstrum coefficients; auto correlation function coefficients; cepstral coefficient; delta MFCC; feature combination; hidden Markov model; speech emotion recognition; speech energy; speech signal; statistical values; Accuracy; Correlation; Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition;