مرکز منطقه ای اطلاع رساني علوم و فناوري - An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

DocumentCode :

106935

Title :

An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

Author :

Jun Du ; Qiang Huo

Author_Institution :

Nat. Eng. Lab. for Speech & Language Inf. Process. (NEL-SLIP), Univ. of Sci. & Technol. of China, Hefei, China

Volume :

Issue :

fYear :

2014

fDate :

Nov. 2014

Firstpage :

1601

Lastpage :

1611

Abstract :

In our previous work, we proposed a feature compensation approach using high-order vector Taylor series (VTS) approximation for noisy speech recognition. In this paper, we report new progress on making it more powerful and practical in real applications. First, mixtures of densities are used to enhance the distortion models of both additive noise and convolutional distortion. New formulations for maximum likelihood (ML) estimation of distortion model parameters, and minimum mean squared error (MMSE) estimation of clean speech are derived and presented. Second, we improve the feature compensation in both efficiency and accuracy by applying higher order information of VTS approximation only to the noisy speech mean parameters, and a temporal smoothing operation for the posterior probability of Gaussian mixture components in clean speech estimation. Finally, we design a procedure to perform irrelevant variability normalization (IVN) based joint training of a reference Gaussian mixture model (GMM) for feature compensation and hidden Markov models (HMMs) for acoustic modeling using VTS-based feature compensation. The effectiveness of our proposed approach is confirmed by experiments on Aurora3 benchmark database for a real-world in-vehicle connected digits recognition task. Compared with ETSI advanced front-end, our approach achieves significant recognition accuracy improvement across three “training-testing” conditions for four languages.

Keywords :

Gaussian processes; hidden Markov models; least mean squares methods; maximum likelihood estimation; mixture models; series (mathematics); speech recognition; Aurora3 benchmark database; Gaussian mixture component; IVN training; VTS feature compensation; acoustic modeling; additive noise; convolutional distortion; distortion mixture model; distortion model parameter; hidden Markov model; high order vector Taylor series approximation; in-vehicle connected digits recognition task; irrelevant variability normalization; maximum likelihood estimation; minimum mean squared error estimation; noisy speech recognition; posterior probability; reference Gaussian mixture model; temporal smoothing operation; Approximation methods; Estimation; Hidden Markov models; Nonlinear distortion; Speech; Training; Vectors; Feature compensation; irrelevant variability normalization; mixture model of distortion; noisy speech recognition; vector Taylor series;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2014.2341912

Filename :

6862902

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=106935