مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic gender classification using the mel frequency cepstrum of neutral and whispered speech: A comparative study

DocumentCode :

2427258

Title :

Automatic gender classification using the mel frequency cepstrum of neutral and whispered speech: A comparative study

Author :

Nisha Meenakshi, G. ; Ghosh, Prasanta Kumar

Author_Institution :

Electr. Eng., Indian Inst. of Sci. (IISc), Bangalore, India

fYear :

2015

fDate :

Feb. 27 2015-March 1 2015

Firstpage :

Lastpage :

Abstract :

A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use of the vocal tract filter captured through the spectral envelope for automatic gender classification (AGC) from a whispered speech. The spectral envelope is represented by the Mel frequency cepstral coefficients (MFCCs). We also compare the AGC performance from the neutral speech using only MFCCs with that from the whispered speech. AGC experiment using a set of 33 sentences spoken in neutral and whispered mode by 16 female and 20 male speakers reveals that the AGC accuracy using the neutral speech is, on average, higher (4.5% absolute) than that using the whispered speech when only the spectral shape information is used. This is true even when we use a subset of MFCCs obtained by a forward cepstral coefficient selection algorithm. However, the AGC accuracy obtained using the MFCC of the neutral speech is found to be 2.83% (absolute) lower compared to that using pitch. These findings not only suggest that there is gender specific information in the spectral shape but also indicate that the spectral shape carries less gender specific information when a speaker whispers as opposed to speaking normally.

Keywords :

speech processing; vibrations; AGC; MFCC; Mel frequency cepstral coefficient; automatic gender classification; forward cepstral coefficient selection algorithm; neutral speech; source signal; spectral envelope; spectral shape information; unvoiced speech; vocal fold vibration; vocal tract filter; whispered speech; Accuracy; Mel frequency cepstral coefficient; Spectral shape; Speech; Support vector machines; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications (NCC), 2015 Twenty First National Conference on

Conference_Location :

Mumbai

Type :

conf

DOI :

10.1109/NCC.2015.7084886

Filename :

7084886

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2427258