Speech enhancement using segmental nonnegative matrix factorization

Author

Hao-teng Fan ; Jeih-weih Hung ; Xugang Lu ; Syu-Siang Wang ; Yu Tsao

Author_Institution

Dept. of Electr. Eng., Nat. Chi Nan Univ., Nantou, Taiwan

fYear

2014

fDate

4-9 May 2014

Firstpage

4483

Lastpage

4487

Abstract

The conventional NMF-based speech enhancement algorithm analyzes the magnitude spectrograms of both clean speech and noise in the training data via NMF and estimates a set of spectral basis vectors. These basis vectors are used to span a space to approximate the magnitude spectrogram of the noise-corrupted testing utterances. Finally, the components associated with the clean-speech spectral basis vectors are used to construct the updated magnitude spectrogram, producing an enhanced speech utterance. Considering that the rich spectral-temporal structure may be explored in local frequency and time-varying spectral patches, this study proposes a segmental NMF (SNMF) speech enhancement scheme to improve the conventional frame-wise NMF-based method. Two algorithms are derived to decompose the original nonnegative matrix associated with the magnitude spectrogram; the first algorithm is used in the spectral domain and the second algorithm is used in the temporal domain. When using the decomposition processes, noisy speech signals can be modeled more precisely, and spectrograms regarding the speech part can be constituted more favorably compared with using the conventional NMF-based method. Objective evaluations using perceptual evaluation of speech quality (PESQ) indicate that the proposed SNMF strategy increases the sound quality in noise conditions and outperforms the well-known MMSE log-spectral amplitude (LSA) estimation.

Keywords

least mean squares methods; matrix decomposition; speech enhancement; MMSE log spectral amplitude estimation; magnitude spectrograms; noisy speech signals; perceptual evaluation of speech quality; segmental nonnegative matrix factorization; spectral basis vectors; spectral domain; speech enhancement; temporal domain; Hidden Markov models; Noise; Spectrogram; Speech; Speech enhancement; Vectors; NMF; nonnegative matrix factorization; patch processing; speech enhancement; sub-band processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854450

Filename

6854450