DocumentCode :
2427258
Title :
Automatic gender classification using the mel frequency cepstrum of neutral and whispered speech: A comparative study
Author :
Nisha Meenakshi, G. ; Ghosh, Prasanta Kumar
Author_Institution :
Electr. Eng., Indian Inst. of Sci. (IISc), Bangalore, India
fYear :
2015
fDate :
Feb. 27 2015-March 1 2015
Firstpage :
1
Lastpage :
6
Abstract :
A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use of the vocal tract filter captured through the spectral envelope for automatic gender classification (AGC) from a whispered speech. The spectral envelope is represented by the Mel frequency cepstral coefficients (MFCCs). We also compare the AGC performance from the neutral speech using only MFCCs with that from the whispered speech. AGC experiment using a set of 33 sentences spoken in neutral and whispered mode by 16 female and 20 male speakers reveals that the AGC accuracy using the neutral speech is, on average, higher (4.5% absolute) than that using the whispered speech when only the spectral shape information is used. This is true even when we use a subset of MFCCs obtained by a forward cepstral coefficient selection algorithm. However, the AGC accuracy obtained using the MFCC of the neutral speech is found to be 2.83% (absolute) lower compared to that using pitch. These findings not only suggest that there is gender specific information in the spectral shape but also indicate that the spectral shape carries less gender specific information when a speaker whispers as opposed to speaking normally.
Keywords :
speech processing; vibrations; AGC; MFCC; Mel frequency cepstral coefficient; automatic gender classification; forward cepstral coefficient selection algorithm; neutral speech; source signal; spectral envelope; spectral shape information; unvoiced speech; vocal fold vibration; vocal tract filter; whispered speech; Accuracy; Mel frequency cepstral coefficient; Spectral shape; Speech; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications (NCC), 2015 Twenty First National Conference on
Conference_Location :
Mumbai
Type :
conf
DOI :
10.1109/NCC.2015.7084886
Filename :
7084886
Link To Document :
بازگشت