DocumentCode
48715
Title
Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence
Author
Meng Sun ; Yinan Li ; Gemmeke, Jort F. ; Xiongwei Zhang
Author_Institution
Lab. of Intell. Inf. Process., PLA Univ. of Sci. & Technol., Nanjing, China
Volume
23
Issue
7
fYear
2015
fDate
Jul-15
Firstpage
1233
Lastpage
1242
Abstract
A key stage in speech enhancement is noise estimation which usually requires prior models for speech or noise or both. However, prior models can sometimes be difficult to obtain. In this paper, without any prior knowledge of speech and noise, sparse and low-rank nonnegative matrix factorization (NMF) with Kullback-Leibler divergence is proposed to noise and speech estimation by decomposing the input noisy magnitude spectrogram into a low-rank noise part and a sparse speech-like part. This initial unsupervised speech-noise estimation allows us to set a subsequent regularized version of NMF or convolutional NMF to reconstruct the noise and speech spectrogram, either by estimating a speech dictionary on the fly (categorized as unsupervised approaches) or by using a pre-trained speech dictionary on utterances with disjoint speakers (categorized as semi-supervised approaches). Information fusion was investigated by taking the geometric mean of the outputs from multiple enhancement algorithms. The performance of the algorithms were evaluated on five metrics (PESQ, SDR, SNR, STOI, and OVERALL) by making experiments on TIMIT with 15 noise types. The geometric means of the proposed unsupervised approaches outperformed spectral subtraction (SS), minimum mean square estimation (MMSE) under low input SNR conditions. All the proposed semi-supervised approaches showed superiority over SS and MMSE and also obtained better performance than the state-of-the-art algorithms which utilized a prior noise or speech dictionary under low SNR conditions.
Keywords
matrix decomposition; speech enhancement; Kullback-Leibler divergence; MMSE; convolutional NMF; information fusion; input noisy magnitude spectrogram; low input SNR conditions; low-rank NMF; minimum mean square estimation; multiple enhancement algorithms; nonnegative matrix factorization; pre-trained speech dictionary; sparse NMF; spectral subtraction; speech enhancement; speech spectrogram; unsupervised speech-noise estimation; Dictionaries; Hidden Markov models; Noise; Spectrogram; Speech; Speech enhancement; Blockwise/convolutional nonnegative matrix factorization; sparse and low-rank decomposition; speech enhancement;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2427520
Filename
7097695
Link To Document