مرکز منطقه ای اطلاع رساني علوم و فناوري - A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise

DocumentCode :

1520602

Title :

A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise

Author :

Borgstrom, Bengt J. ; Alwan, Abeer

Author_Institution :

Human Language Technol. Group, MIT Lincoln Lab., Lexington, MA, USA

Volume :

Issue :

fYear :

2011

Firstpage :

2579

Lastpage :

2590

Abstract :

In this paper, we present a stochastic framework for designing optimal short-time spectral amplitude (STSA) estimators for speech enhancement assuming phase equivalence of speech and noise. By assuming additive superposition of speech and noise, which is implied by the maximum-likelihood (ML) phase estimate, we effectively project the optimal spectral amplitude estimation problem onto a 1-D subspace of the complex spectral plane, thus simplifying the problem formulation. Assuming generalized Gamma distributions (GGDs) for a priori distributions of both speech and noise STSAs, we derive separate families of novel estimators according to either the maximum-likelihood (ML), the minimum mean-square error (MMSE), or the maximum a posteriori (MAP) criterion. The use of GGDs allows optimal estimators to be determined in a generalized form, so that particular solutions can be obtained by substituting statistical shape parameters corresponding to expected speech and noise priors. It is interesting to note that several of the proposed estimators exhibit strong similarities to well-known STSA solutions. For example, the magnitude spectral subtracter (MSS) and Wiener filter (WF) are obtained for specific cases of GGD shape parameters. Quantitative analysis of a selected subset of the proposed estimators shows improvement over the traditional log-spectral MMSE estimator of Ephraim and Malah, in terms of segmental signal-to-noise ratio (SNR) and the COSH distance measure, when applied to the Noizeus database. Although single-channel speech enhancement is offered as an illustrative example, the theory presented here could be applicable to other signals, such as music and images.

Keywords :

Wiener filters; gamma distribution; least mean squares methods; maximum likelihood estimation; noise (working environment); phase estimation; spectral analysis; speech enhancement; stochastic processes; GGD; MMSE; Noizeus database; Wiener filter; additive superposition; generalized Gamma distributions; log-spectral MMSE estimator; magnitude spectral subtracter; maximum a posteriori; maximum likelihood estimation; minimum mean square error; noise distributions; optimal STSA estimators; phase equivalence estimation; short-time spectral amplitude; signal-to-noise ratio; speech enhancement; statistical shape parameters; stochastic process; Mathematical model; Maximum likelihood estimation; Noise; Speech enhancement; Generalized gamma distribution (GGD); noise suppression; short-time spectral amplitude (STSA) estimation; spectral subtraction; speech enhancement;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2156784

Filename :

5771053

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1520602