DocumentCode :
1351852
Title :
Speaker Identification and Verification by Combining MFCC and Phase Information
Author :
Nakagawa, Seiichi ; Wang, Longbiao ; Ohtsuka, Shinji
Author_Institution :
Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Toyohashi, Japan
Volume :
20
Issue :
4
fYear :
2012
fDate :
5/1/2012 12:00:00 AM
Firstpage :
1085
Lastpage :
1095
Abstract :
In conventional speaker recognition methods based on Mel-frequency cepstral coefficients (MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase information extraction method that normalizes the change variation in the phase according to the frame position of the input speech and combines the phase information with MFCCs in text-independent speaker identification and verification methods. There is a problem with the original phase information extraction method when comparing two phase values. For example, the difference in the two values of π-mathtildeθ1 and mathtildeθ2=-π+mathtildeθ1 is 2π-2mathtildeθ1 . If mathtildeθ1 ≈ 0, then the difference ≈ 2π, despite the two phases being very similar to one another. To address this problem, we map the phase into coordinates on a unit circle. Speaker identification and verification experiments are performed using the NTT database which consists of sentences uttered by 35 (22 male and 13 female) Japanese speakers with normal, fast and slow speaking modes during five sessions. Although the phase information-based method performs worse than the MFCC-based method, it augments the MFCC and the combination is useful for speaker recognition. The proposed modified phase information is more robust than the original phase information for all speaking modes. By integrating the modified phase information with the MFCCs, the speaker identification rate was improved to 98.8% from 97.4% (MFCC), and equal error rate for speaker verification was reduced to 0.45% from 0.72% (MFCC), respectively. We also conducted the speaker identification and verification experiments on a large-scale Japanese Newspaper Article Sentences (JNAS) database, a similar trend as NTT database was obtained.
Keywords :
cepstral analysis; speaker recognition; MFCC; Mel-frequency cepstral coefficients; NTT database; phase information; speaker identification; speaker recognition; speaker verification; Delay; Humans; Mel frequency cepstral coefficient; Shape; Speaker recognition; Speech; Speech recognition; Gaussian mixture model (GMM); Mel-frequency cepstral coefficient (MFCC); phase information; speaker identification; speaker verification;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2172422
Filename :
6047571
Link To Document :
بازگشت