DocumentCode
3161959
Title
Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition
Author
Hurmalainen, Antti ; Virtanen, Tuomas
Author_Institution
Tampere Univ. of Technol., Tampere, Finland
fYear
2012
fDate
25-30 March 2012
Firstpage
4113
Lastpage
4116
Abstract
Non-negative spectral factorisation has been used successfully for separation of speech and noise in automatic speech recognition, both in feature-enhancing front-ends and in direct classification. In this work, we propose employing spectro-temporal 2D filters to model dynamic properties of Mel-scale spectrogram patterns in addition to static magnitude features. The results are evaluated using an exemplar-based sparse classifier on the CHiME noisy speech database. After optimisation of static features and modelling of temporal dynamics with derivative features, we achieve 87.4% average score over SNRs from 9 to -6 dB, reducing the word error rate by 28.1% from our previous static-only features.
Keywords
filters; noise; optimisation; speech recognition; CHiME noisy speech database; Mel-scale spectrogram; SNR; direct classification; exemplar-based sparse classifier; factorisation-based noise-robust automatic speech recognition; feature-enhancing front-ends; noise separation; optimisation; spectro-temporal 2D filters; spectro-temporal dynamic model; speech separation; static magnitude features; Feature extraction; Noise; Noise measurement; Spectrogram; Speech; Speech recognition; Vectors; Automatic speech recognition; exemplar-based; noise robustness; spectral factorisation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288823
Filename
6288823
Link To Document