DocumentCode :
1544808
Title :
Modeling of the glottal flow derivative waveform with application to speaker identification
Author :
Plumpe, Michael D. ; Quatieri, Thomas F. ; Reynolds, Douglas A.
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Volume :
7
Issue :
5
fYear :
1999
fDate :
9/1/1999 12:00:00 AM
Firstpage :
569
Lastpage :
586
Abstract :
An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations in the flow, from which model parameters are obtained. The glottal flow derivative is estimated using an inverse filter determined within a time interval of vocal-fold closure that is identified through differences in formant frequency modulation during the open and closed phases of the glottal cycle. This formant motion is predicted by Ananthapadmanabha and Fant (1982) to be a result of time-varying and nonlinear source/vocal tract coupling within a glottal cycle. The glottal flow derivative estimate is modeled using the Liljencrants-Fant (1986) model to capture its coarse structure, while the fine structure of the flow derivative is represented through energy and perturbation measures. The model parameters are used in a Gaussian mixture model speaker identification (SID) system. Both coarse- and fine-structure glottal features are shown to contain significant speaker-dependent information. For a large TIMIT database subset, averaging over male and female SID scores, the coarse-structure parameters achieve about 60% accuracy, the fine-structure parameters give about 40% accuracy, and their combination yields about 70% correct identification. Finally, in preliminary experiments on the counterpart telephone-degraded NTIMIT database, about a 5% error reduction in SID scores is obtained when source features are combined with traditional mel-cepstral measures
Keywords :
Gaussian processes; filtering theory; inverse problems; parameter estimation; speaker recognition; speech processing; waveform analysis; Gaussian mixture model; Liljencrants-Fant model; aspiration; automatic technique; coarse structure; energy measures; error reduction; fine structure; fine-structure parameters; formant frequency modulation; formant motion; general flow shape; glottal cycle; glottal flow derivative source waveform; glottal flow derivative waveform; inverse filter; large TIMIT database subset; mel-cepstral measures; model parameters; nonlinear source/vocal tract coupling; perturbation measures; perturbations; speaker identification; speech waveform; telephone-degraded NTIMIT database; time interval; time-varying source/vocal tract coupling; vocal-fold closure; waveform estimation; waveform modeling; Couplings; Energy capture; Energy measurement; Filters; Frequency estimation; Frequency modulation; Phase estimation; Shape; Spatial databases; Speech;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.784109
Filename :
784109
Link To Document :
بازگشت