مرکز منطقه ای اطلاع رساني علوم و فناوري - Maximum kurtosis beamforming with a subspace filter for distant speech recognition

DocumentCode :

3485129

Title :

Maximum kurtosis beamforming with a subspace filter for distant speech recognition

Author :

Kumatani, Kenichi ; McDonough, John ; Raj, Bhiksha

Author_Institution :

Disney Res., Pittsburgh, Pittsburgh, PA, USA

fYear :

2011

fDate :

11-15 Dec. 2011

Firstpage :

179

Lastpage :

184

Abstract :

This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, Section 6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer´s output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3%, which is comparable to the WER of 4.2% obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.

Keywords :

array signal processing; eigenvalues and eigenfunctions; error statistics; filtering theory; microphone arrays; optimisation; speech recognition; MK beamforming; WER; active weight vector; adaptation data; ambient noise component; blocking matrix; close-talking microphone; directional signal; distant speech recognition; dominant mode subspace; eigenspace filter; generalized sidelobe canceler; maximum kurtosis beamforming; maximum kurtosis criterion; microphone array; noise effect; numerical optimization algorithm; recognition performance; reverberant effect; subspace filter; word error rate; Array signal processing; Eigenvalues and eigenfunctions; Microphones; Noise; Speech; Speech recognition; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location :

Waikoloa, HI

Print_ISBN :

978-1-4673-0365-1

Electronic_ISBN :

978-1-4673-0366-8

Type :

conf

DOI :

10.1109/ASRU.2011.6163927

Filename :

6163927

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3485129