Title :
CASA-Based Robust Speaker Identification
Author :
Zhao, Xiaojia ; Shao, Yang ; Wang, DeLiang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fDate :
7/1/2012 12:00:00 AM
Abstract :
Conventional speaker recognition systems perform poorly under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We investigate CASA for robust speaker identification. We first introduce a novel speaker feature, gammatone frequency cepstral coefficient (GFCC), based on an auditory periphery model, and show that this feature captures speaker characteristics and performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by a CASA mask. We find that both reconstruction and marginalization are effective. We further combine the two methods into a single system based on their complementary advantages, and this system achieves significant performance improvements over related systems under a wide range of signal-to-noise ratios.
Keywords :
cepstral analysis; speaker recognition; CASA; GFCC; auditory periphery model; computational auditory scene analysis; gammatone frequency cepstral coefficient; marginalize corrupted components; robust speaker identification; signal-to-noise ratios; speaker recognition systems; Cepstral analysis; Feature extraction; Filter banks; Noise measurement; Robustness; Speaker recognition; Speech; Computational auditory scene analysis (CASA); gammatone frequency cepstral coefficient (GFCC); ideal binary mask; robust speaker identification;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2186803