Spectral subtraction and RASTA-filtering in text-dependent HMM-based speaker verification

Author

Hardt, Detlef ; Fellbaum, Klaus

Author_Institution

Inst. of Telecommun. & Theor. Electr. Eng., Tech. Univ. Berlin, Germany

Volume

2

fYear

1997

fDate

21-24 Apr 1997

Firstpage

867

Abstract

In real text-dependent telephone-based speaker verification systems, both, additive and convolutional noise influence the error rate considerably. In this paper, different procedures which make a speaker verification system more robust against noise are compared. We either use spectral subtraction in addition to MFCC-feature extraction or only PLP and RASTA-PLP (without spectral subtraction). Considering spectral subtraction two modifications were examined: one version which was preconnected to the system and a second one being integrated into the MFCC computation. The first version has the advantage that the window length can be chosen independently of those of the MFCC procedure. This led to better results. However, the most effective procedure for telephone speech data is the J-RASTA-PLP, but the estimation of the optimal J factor is difficult. At first we used a fixed J factor based on off-line measurement of noise power. Finally, we performed some experiments to optimize the system with the adaptive estimation of the J factor during utterance. This procedure is based on the method of spectral mapping which has been shown to be very effective in automatic speech recognition

Keywords

acoustic filters; acoustic noise; adaptive estimation; error statistics; feature extraction; hidden Markov models; optimisation; speaker recognition; spectral analysis; telephony; J-RASTA-PLP; MFCC-feature extraction; PLP; RASTA-PLP; RASTA-filtering; adaptive estimation; additive noise; automatic speech recognition; convolutional noise; error rate; noise power; optimal J factor; real text-dependent telephone-based speaker verification systems; spectral mapping; spectral subtraction; telephone speech data; text-dependent HMM-based speaker verification; utterance; window length; Additive noise; Automatic speech recognition; Error analysis; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Noise robustness; Spatial databases; Speech enhancement; Telephony;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location

Munich

ISSN

1520-6149

Print_ISBN

0-8186-7919-0

Type

conf

DOI

10.1109/ICASSP.1997.596073

Filename

596073