Incorporating a psychoacoustical model in frequency domain speech enhancement

Author

Hu, Yi ; Loizou, Philipos C.

Author_Institution

Dept. of Electr. Eng., Univ. of Texas, Richardson, TX, USA

Volume

11

Issue

2

fYear

2004

Firstpage

270

Lastpage

273

Abstract

A frequency domain optimal linear estimator is proposed which incorporates the masking properties of the human auditory system to make the residual noise distortion inaudible. The use of wavelet-thresholded multitaper spectra is also proposed for frequency-domain speech enhancement methods as an alternative to the traditional fast Fourier transform (FFT)-based magnitude spectra. Experiments with multitalker babble noise indicated that the proposed estimator outperformed the minimum mean-square error log-spectral amplitude estimator (MMSE-LSA), particularly when wavelet-thresholded multitaper spectra were used in place of the FFT spectra.

Keywords

acoustic noise; frequency-domain analysis; hearing; least mean squares methods; spectral analysis; speech enhancement; MMSE; fast Fourier transform based magnitude spectra; frequency domain optimal linear estimator; frequency domain speech enhancement method; human auditory system; masking property; minimum mean-square error log-spectral amplitude estimator; multitalker babble noise; musical noise; power spectrum estimation; psychoacoustical model; residual noise distortion; wavelet-thresholded multitaper spectra; Amplitude estimation; Auditory system; Fast Fourier transforms; Frequency domain analysis; Frequency estimation; Humans; Noise level; Psychoacoustic models; Psychology; Speech enhancement;

fLanguage

English

Journal_Title

Signal Processing Letters, IEEE

Publisher

ieee

ISSN

1070-9908

Type

jour

DOI

10.1109/LSP.2003.821714

Filename

1261997