• DocumentCode
    2709437
  • Title

    Combination of generative models and SVM based classifier for speech emotion recognition

  • Author

    Chandrakala, S. ; Sekhar, C. Chandra

  • Author_Institution
    Dept. of Comput. Sci. & Eng., IIT Madras, Chennai, India
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    497
  • Lastpage
    502
  • Abstract
    Modeling time series data of varying length is important in different domains. There are two paradigms for modeling the varying length sequential data. Tasks such as speech recognition need modeling the temporal dynamics and the correlations among the features. Hidden Markov models (HMM) are used for these tasks. In tasks such as speaker recognition, audio classification and speech emotion recognition, modeling the temporal dynamics is not critical. Gaussian mixture models (GMM) are commonly used for these tasks. Generative models such as HMMs and GMMs focus on estimating the density of the data and are not suitable for classifying the data of confusable classes. Discriminative classifiers such as support vector machines (SVM) are suitable for the fixed dimensional patterns. In this paper, we propose a hybrid framework where a generative front end is used for representing the varying length time series data and then a discriminative model is used for classification. A score based approach and a segment modeling based approach are proposed in this framework. Both the approaches are applied for speech emotion recognition. The performance is compared with that of an SVM classifier that uses different statistical features and also with that of the GMM classifiers that use maximum likelihood method and the variational Bayes method for parameter estimation. Both the proposed approaches outperform the methods used for comparison.
  • Keywords
    Gaussian processes; audio signal processing; emotion recognition; feature extraction; hidden Markov models; image classification; image representation; image segmentation; image sequences; speech recognition; support vector machines; time series; Gaussian mixture model; audio classification; discriminative classifier; generative model; hidden Markov model; image representation; segment modeling; speaker recognition; speech emotion recognition; support vector machine; temporal dynamic; time series data; Emotion recognition; Hidden Markov models; Hybrid power systems; Neural networks; Probability; Speaker recognition; Speech recognition; Support vector machine classification; Support vector machines; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5178777
  • Filename
    5178777