Title :
Noise-robust speech recognition with exemplar-based sparse representations using Alpha-Beta divergence
Author :
Yilmaz, Emre ; Gemmeke, Jori F. ; Van hamme, Hugo
Author_Institution :
Dept. ESAT, KU Leuven, Leuven, Belgium
Abstract :
In this paper, we investigate the performance of a noise-robust sparse representations (SR)-based recognizer using the Alpha-Beta (AB)-divergence to compare the noisy speech segments and exemplars. The baseline recognizer, which approximates noisy speech segments as a linear combination of speech and noise exemplars of variable length, uses the generalized Kullback-Leibler divergence to quantify the approximation quality. Incorporating a reconstruction error-based back-end, the recognition performance highly depends on the congruence of the divergence measure and used speech features. Having two tuning parameters, namely α and β, the AB-divergence provides improved robustness against background noise and outliers. These parameters can be adjusted for better performance depending on the distribution of speech and noise exemplars in the high-dimensional feature space. Moreover, various well-known distance/divergence measures such as the Euclidean distance, generalized Kullback-Leibler divergence, Itakura-Saito divergence and Hellinger distance are special cases of the AB-divergence for different (α, β) values. The goal of this work is to investigate the optimal divergence for mel-scaled magnitude spectral features by performing recognition experiments at several SNR levels using different (α, β) pairs. The results demonstrate the effectiveness of the AB-divergence compared to the generalized Kullback-Leibler divergence especially at the lower SNR levels.
Keywords :
noise; speech processing; speech recognition; AB-divergence; Euclidean distance; Hellinger distance; Itakura-Saito divergence; Kullback-Leibler divergence; SNR level; SR-based recognizer; alpha-beta divergence; approximation quality; baseline recognizer; divergence measure; exemplar-based sparse representation; high-dimensional feature space; mel-scaled magnitude spectral features; noise exemplar; noise-robust speech recognition; noisy speech segment; optimal divergence; recognition performance; reconstruction error; speech distribution; speech feature; Accuracy; Dictionaries; Noise; Noise measurement; Speech; Speech recognition; Vectors; alpha-beta divergence; exemplar-based speech recognition; noise-robustness; sparse representations;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854655