Pitch and MFCC dependent GMM models for speaker identification systems

Author

Ezzaidi, Hassan ; Rouat, Jean

Author_Institution

Quebec Univ., Chicoutimi, Que., Canada

Volume

1

fYear

2004

Firstpage

43

Abstract

Recently, we proposed an approach to speaker identification which jointly exploits vocal tract and glottis source information. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model, which uses a joint law, is presented. Some restrictions and simplifications are taken into account to show the significance of this approach in practical way. The fundamental frequency and MFCCs (Mel frequency cepstrum coefficients) are used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only female speakers from a speech telephony database (SPIDRE) recorded from various telephone handsets. It is proposed to model the source information by a Gaussian mixture model (GMM) rather than the uniform probabilistic model. Tests were extended to all speakers of the SPIDRE database; four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examines only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is found to follow a normal distribution in one technique and a log normal distribution in the other. With the proposed approach, the gain in performance is 10.5% for women, 7% for men and 8% for all speakers.

Keywords

Gaussian processes; correlation methods; log normal distribution; normal distribution; speaker recognition; GMM; Gaussian mixture model; MFCC; Mel frequency cepstrum coefficients; fundamental frequency; glottis source information; lognormal distribution; normal distribution; pitch; probability density; speaker identification systems; speech telephony database; the vocal signal voiced segments; uniform probabilistic model; vocal tract information; Cepstrum; Databases; Gaussian distribution; Information resources; Log-normal distribution; Mel frequency cepstral coefficient; Speech; System testing; Telephone sets; Telephony;

fLanguage

English

Publisher

ieee

Conference_Titel

Electrical and Computer Engineering, 2004. Canadian Conference on

ISSN

0840-7789

Print_ISBN

0-7803-8253-6

Type

conf

DOI

10.1109/CCECE.2004.1344954

Filename

1344954