DocumentCode :
1096999
Title :
Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification
Author :
Dehak, Najim ; Dumouchel, Pierre ; Kenny, Patrick
Author_Institution :
CRIM, Montreal
Volume :
15
Issue :
7
fYear :
2007
Firstpage :
2095
Lastpage :
2103
Abstract :
In this paper, we introduce the use of continuous prosodic features for speaker recognition, and we show how they can be modeled using joint factor analysis. Similar features have been successfully used in language identification. These prosodic features are pitch and energy contours spanning a syllable-like unit. They are extracted using a basis consisting of Legendre polynomials. Since the feature vectors are continuous (rather than discrete), they can be modeled using a standard Gaussian mixture model (GMM). Furthermore, speaker and session variability effects can be modeled in the same way as in conventional joint factor analysis. We find that the best results are obtained when we use the information about the pitch, energy, and the duration of the unit all together. Testing on the core condition of NIST 2006 speaker recognition evaluation data gives an equal error rate of 16.6% and 14.6%, with prosodic features alone, for all trials and English-only trials, respectively. When the prosodic system is fused with a state-of-the-art cepstral joint factor analysis system, we obtain a relative improvement of 8% (all trials) and 12% (English only) compared to the cepstral system alone.
Keywords :
Gaussian processes; Legendre polynomials; cepstral analysis; speaker recognition; vectors; English-only trials; Legendre polynomials; cepstral joint factor analysis system; feature vector; language identification; prosodic features modeling; speaker recognition; speaker verification; standard Gaussian mixture model; Cepstral analysis; Data mining; Error analysis; Feature extraction; Mel frequency cepstral coefficient; NIST; Polynomials; Speaker recognition; Statistics; Testing; Joint factor analysis; Legendre polynomial; prosodic features; speaker recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.902758
Filename :
4291597
Link To Document :
بازگشت