DocumentCode
2801641
Title
IITKGP-SEHSC : Hindi Speech Corpus for Emotion Analysis
Author
Koolagudi, Shashidhar G. ; Reddy, Ramu ; Yadav, Jainath ; Rao, K. Sreenivasa
Author_Institution
Sch. of Inf. Technol., Indian Inst. of Technol., Kharagpur, India
fYear
2011
fDate
24-25 Feb. 2011
Firstpage
1
Lastpage
5
Abstract
In this paper, simulated emotion Hindi speech corpus has been introduced for analyzing the emotions present in speech signals. The proposed database is recorded using professional artists from Gyanavani FM radio station, Varanasi, India. The speech corpus is collected by simulating eight different emotions using neutral (emotion free) text prompts. The emotions present in the database are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise. This speech corpus is named as Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC). Emotion classification is performed on the proposed IITKGP-SEHSC using prosodic and spectral features. Mel frequency cepstral coefficients (MFCCs) are used to represent spectral information. Energy, pitch and duration are used to represent prosody information. The average emotion recognition performance using prosodic and spectral features are found to be around 77% and 81% for female speech utterances. This paper describes the design, acquisition, post processing and evaluation of the proposed speech corpus (IITKGP-SEHSC). The quality of the emotions expressed in the database is evaluated using subjective listening tests. The emotion recognition performance using subjective listening tests is observed to be around 74%. The results of subjective listening tests are grossly on par with the results obtained using prosodic analysis of the database.
Keywords
cepstral analysis; emotion recognition; natural language processing; speech processing; Hindi speech corpus; Mel frequency cepstral coefficients; emotion classification; emotion recognition; neutral text prompts; prosodic features; spectral features; speech signals; Computational modeling; Databases; Emotion recognition; Feature extraction; Speech; Speech recognition; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Devices and Communications (ICDeCom), 2011 International Conference on
Conference_Location
Mesra
Print_ISBN
978-1-4244-9189-6
Type
conf
DOI
10.1109/ICDECOM.2011.5738540
Filename
5738540
Link To Document