مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker Identification and Verification by Combining MFCC and Phase Information

DocumentCode :

1351852

Title :

Speaker Identification and Verification by Combining MFCC and Phase Information

Author :

Nakagawa, Seiichi ; Wang, Longbiao ; Ohtsuka, Shinji

Author_Institution :

Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Toyohashi, Japan

Volume :

Issue :

fYear :

2012

fDate :

5/1/2012 12:00:00 AM

Firstpage :

1085

Lastpage :

1095

Abstract :

In conventional speaker recognition methods based on Mel-frequency cepstral coefficients (MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase information extraction method that normalizes the change variation in the phase according to the frame position of the input speech and combines the phase information with MFCCs in text-independent speaker identification and verification methods. There is a problem with the original phase information extraction method when comparing two phase values. For example, the difference in the two values of π-mathtildeθ₁ and mathtildeθ₂=-π+mathtildeθ₁ is 2π-2mathtildeθ₁ . If mathtildeθ₁ ≈ 0, then the difference ≈ 2π, despite the two phases being very similar to one another. To address this problem, we map the phase into coordinates on a unit circle. Speaker identification and verification experiments are performed using the NTT database which consists of sentences uttered by 35 (22 male and 13 female) Japanese speakers with normal, fast and slow speaking modes during five sessions. Although the phase information-based method performs worse than the MFCC-based method, it augments the MFCC and the combination is useful for speaker recognition. The proposed modified phase information is more robust than the original phase information for all speaking modes. By integrating the modified phase information with the MFCCs, the speaker identification rate was improved to 98.8% from 97.4% (MFCC), and equal error rate for speaker verification was reduced to 0.45% from 0.72% (MFCC), respectively. We also conducted the speaker identification and verification experiments on a large-scale Japanese Newspaper Article Sentences (JNAS) database, a similar trend as NTT database was obtained.

Keywords :

cepstral analysis; speaker recognition; MFCC; Mel-frequency cepstral coefficients; NTT database; phase information; speaker identification; speaker recognition; speaker verification; Delay; Humans; Mel frequency cepstral coefficient; Shape; Speaker recognition; Speech; Speech recognition; Gaussian mixture model (GMM); Mel-frequency cepstral coefficient (MFCC); phase information; speaker identification; speaker verification;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2172422

Filename :

6047571

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1351852