IITKGP-SEHSC : Hindi Speech Corpus for Emotion Analysis

Author

Koolagudi, Shashidhar G. ; Reddy, Ramu ; Yadav, Jainath ; Rao, K. Sreenivasa

Author_Institution

Sch. of Inf. Technol., Indian Inst. of Technol., Kharagpur, India

fYear

2011

fDate

24-25 Feb. 2011

Firstpage

1

Lastpage

5

Abstract

In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.

Keywords

cepstral analysis; emotion recognition; natural language processing; speech processing; Hindi speech corpus; Mel frequency cepstral coefficients; emotion classification; emotion recognition; neutral text prompts; prosodic features; spectral features; speech signals; Computational modeling; Databases; Emotion recognition; Feature extraction; Speech; Speech recognition; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Devices and Communications (ICDeCom), 2011 International Conference on

Conference_Location

Mesra

Print_ISBN

978-1-4244-9189-6

Type

conf

DOI

10.1109/ICDECOM.2011.5738540

Filename

5738540