Author_Institution :
Department of Electrical Engineering, National Chung Cheng University, Chiayi, 62102 Taiwan
Abstract :
In this work, a typical hierarchical Support Vector Machine (SVM) classifier structure with three stages is adopted to identify 6 affective modes of happy, angry, excited, nervous, sad and calm from Musical TeleVision (MTV) sequences, which comprise audio and video signals. To comprehend emotional modes, audio features including the spectral centroid, spectral spread, zero crossing rate, peak of zero crossing rates, duration, tempo and variance of FFT coefficients, and visual features including color temperature and standard deviations of motion vectors are used. They are extracted and jointly employed to increase the recognition accuracy according to their physical characteristics on emotions. Particularly, adequate features are addressed and investigated in each classification stage. The experimental results demonstrate that the proposed affective recognition scheme can achieve a fair recognition rate of 73.3%. As compared to the results from the one-stage scheme using audio features only, the proposed scheme can greatly enhance the recognition accuracy.