مرکز منطقه ای اطلاع رساني علوم و فناوري - Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition

DocumentCode :

1395359

Title :

Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition

Author :

Fazel, Amin ; Chakrabartty, Shantanu

Author_Institution :

Dept. of Electr. & Comput. Eng., Michigan State Univ., East Lansing, MI, USA

Volume :

Issue :

fYear :

2012

fDate :

5/1/2012 12:00:00 AM

Firstpage :

1362

Lastpage :

1371

Abstract :

In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.

Keywords :

Hilbert spaces; audio signal processing; feature extraction; hearing; speech recognition; MAX operation; RKHS; SPARK based speech recognizer; SPARK coefficients; SPARK features; auditory similarity; baseline speech recognizer; computing kernel based similarity; feature pruning; gammatone basis functions; hierarchical combination; hyper-parameters; kernel functions; noise-robust information; noise-robust speech recognition; pooling functions; pooling technique; reproducing kernel Hilbert space; sparse auditory reproducing kernel features; speech feature extraction algorithm; speech signal; standard AURORA2 dataset; standard ETSI STQ WI008 DSR features; time-shifted gammatone functions; word-accuracy; Feature extraction; Kernel; Psychoacoustic models; Sparks; Speech; Speech recognition; Vectors; Auditory HMAX; gammatone functions; reproducing kernel Hilbert space (RKHS); robust speech recognition; sparse features;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2179294

Filename :

6099594

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1395359