DocumentCode :
1060063
Title :
Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation
Author :
Chan, Wai Nang ; Zheng, Nengheng ; Lee, Tan
Author_Institution :
Chinese Univ. of Hong Kong, Hong Kong
Volume :
15
Issue :
6
fYear :
2007
Firstpage :
1884
Lastpage :
1892
Abstract :
This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.
Keywords :
speech processing; statistical analysis; linear predictive residual signals; pitch-synchronous wavelet transform; segmentation errors reduction; speaker segmentation; statistical speaker; telephone conversation; training data; vocal source power discrimination; vocal tract related features; wavelet octave coefficients; Acoustic testing; Cepstral analysis; Data mining; Feature extraction; Loudspeakers; Mel frequency cepstral coefficient; Speaker recognition; Speech; Telephony; Training data; Speaker discrimination power; speaker segmentation; vocal source features; vocal tract features;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.900103
Filename :
4276747
Link To Document :
بازگشت