• DocumentCode
    2801641
  • Title

    IITKGP-SEHSC : Hindi Speech Corpus for Emotion Analysis

  • Author

    Koolagudi, Shashidhar G. ; Reddy, Ramu ; Yadav, Jainath ; Rao, K. Sreenivasa

  • Author_Institution
    Sch. of Inf. Technol., Indian Inst. of Technol., Kharagpur, India
  • fYear
    2011
  • fDate
    24-25 Feb. 2011
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.
  • Keywords
    cepstral analysis; emotion recognition; natural language processing; speech processing; Hindi speech corpus; Mel frequency cepstral coefficients; emotion classification; emotion recognition; neutral text prompts; prosodic features; spectral features; speech signals; Computational modeling; Databases; Emotion recognition; Feature extraction; Speech; Speech recognition; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Devices and Communications (ICDeCom), 2011 International Conference on
  • Conference_Location
    Mesra
  • Print_ISBN
    978-1-4244-9189-6
  • Type

    conf

  • DOI
    10.1109/ICDECOM.2011.5738540
  • Filename
    5738540