• DocumentCode
    1596764
  • Title

    Pitch and MFCC dependent GMM models for speaker identification systems

  • Author

    Ezzaidi, Hassan ; Rouat, Jean

  • Author_Institution
    Quebec Univ., Chicoutimi, Que., Canada
  • Volume
    1
  • fYear
    2004
  • Firstpage
    43
  • Abstract
    Recently, we proposed an approach to speaker identification which jointly exploits vocal tract and glottis source information. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model, which uses a joint law, is presented. Some restrictions and simplifications are taken into account to show the significance of this approach in practical way. The fundamental frequency and MFCCs (Mel frequency cepstrum coefficients) are used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only female speakers from a speech telephony database (SPIDRE) recorded from various telephone handsets. It is proposed to model the source information by a Gaussian mixture model (GMM) rather than the uniform probabilistic model. Tests were extended to all speakers of the SPIDRE database; four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examines only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is found to follow a normal distribution in one technique and a log normal distribution in the other. With the proposed approach, the gain in performance is 10.5% for women, 7% for men and 8% for all speakers.
  • Keywords
    Gaussian processes; correlation methods; log normal distribution; normal distribution; speaker recognition; GMM; Gaussian mixture model; MFCC; Mel frequency cepstrum coefficients; fundamental frequency; glottis source information; lognormal distribution; normal distribution; pitch; probability density; speaker identification systems; speech telephony database; the vocal signal voiced segments; uniform probabilistic model; vocal tract information; Cepstrum; Databases; Gaussian distribution; Information resources; Log-normal distribution; Mel frequency cepstral coefficient; Speech; System testing; Telephone sets; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2004. Canadian Conference on
  • ISSN
    0840-7789
  • Print_ISBN
    0-7803-8253-6
  • Type

    conf

  • DOI
    10.1109/CCECE.2004.1344954
  • Filename
    1344954