Hierarchical clustering and robust identification for block-based autoregressive speech parameter estimation

Author

Ruofei Chen ; Cheung-Fat Chan

Author_Institution

Dept. of Electron. Eng., City Univ. of Hong Kong, Kowloon, China

fYear

2012

fDate

5-8 Dec. 2012

Firstpage

103

Lastpage

107

Abstract

Given accurate system parameters like state transition matrix F and corruption mapping matrix H, clean speech autoregressive (AR) parameters can be effectively estimated from a series of noisy observations with Kalman filtering. In this paper, we address several fundamental issues to improve the linear dynamical system (LDS) based AR parameter estimation. A hierarchical time series clustering scheme is devised to truly group speech blocks with similar trajectories and corruption types. In addition, a correlated robust identification scheme using a posteriori signal-to-noise (SNR) mask is proposed to improve the identification accuracy. The effectiveness of the proposed clustering and identification scheme is evaluated in terms of spectral distortion between the Kalman estimates and the true clean speech parameters. Significant improvement is observed over the original matrix quantization (MQ) based approach. The proposed scheme is also successfully applied in a model-based speech enhancement application, and is expected to be effective in various codebook driven speech applications for robust identification purpose.

Keywords

Kalman filters; autoregressive processes; matrix algebra; pattern clustering; speech enhancement; time series; Kalman filtering; LDS; MQ; SNR; a posteriori signal-to-noise mask; block-based autoregressive speech parameter estimation; clean speech autoregressive parameters; codebook driven speech applications; corruption mapping matrix; hierarchical time series clustering scheme; linear dynamical system based AR parameter estimation; matrix quantization based approach; model-based speech enhancement application; robust identification; state transition matrix; Estimation; Noise measurement; Signal to noise ratio; Speech; Trajectory; Vectors; autoregressive; clustering; identification; linear dynamical system; time series;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on

Conference_Location

Kowloon

Print_ISBN

978-1-4673-2506-6

Electronic_ISBN

978-1-4673-2505-9

Type

conf

DOI

10.1109/ISCSLP.2012.6423482

Filename

6423482