Speech modeling and voiced/unvoiced/mixed/silence speech segmentation with fractionally Gaussian noise based models

Author

Oveisgharan, Sh ; Shamsollahi, M.B.

Author_Institution

Dept. of Electr. Eng., Sharif Univ. of Technol., Tehran, Iran

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

The ARMA filtered fractionally differenced Gaussian noise (FdGn) model and a new AR filtered FdGn added up model are applied to a speech signal and performance of their parameters on speech unvoiced/voiced/mixed/silence classification is evaluated against the zero crossing rate (ZCR) feature. For parameter estimation of AR filtered FdGn model two methods were applied: the iterative maximum likelihood (ML) method of Tewfik (1993) and a new computationally efficient linear minimum square error (LMSE) algorithm. Also for parameter estimation of the new added up model two approaches were implemented: an expectation-maximization (EM) based approach and an iterative MSE approach. The described models and methods were applied to a speech signal and also its real cepstrum. The performance of the described models on V/U/M/S speech classification was obtained based on the J₁ parameter in this order: added up model on real cepstrum of speech, filtered FdGn model on real cepstrum of speech (LMSE method), filtered FdGn model on speech (LMSE method), ZCR, and filtered FdGn model on speech (Tewfik method).

Keywords

Gaussian noise; autoregressive moving average processes; cepstral analysis; feature extraction; iterative methods; least squares approximations; maximum likelihood estimation; signal classification; speech processing; AR filtered FdGn added up model; ARMA filtered fractionally differenced Gaussian noise; EM based approach; FdGn model; ML method; V/U/M/S speech classification; ZCR feature; computationally efficient LMSE algorithm; expectation-maximization based approach; fractionally Gaussian noise based models; iterative MSE; iterative maximum likelihood; linear minimum square error; parameter estimation; performance; speech modeling; speech segmentation; speech signal; speech signal cepstrum; speech unvoiced/voiced/mixed/silence classification; voiced/unvoiced/mixed/silence speech; zero crossing rate; Cepstrum; Filtering; Gaussian noise; Iterative methods; Maximum likelihood estimation; Nonlinear filters; Parameter estimation; Speech analysis; Speech enhancement; White noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326060

Filename

1326060