DocumentCode :
3425542
Title :
Unsupervised learning of auditory filter banks using non-negative matrix factorisation
Author :
Bertrand, Alexander ; Demuynck, Kris ; Stouten, Veronique ; Hamme, Hugo Van
Author_Institution :
Dept. ESAT Kasteelpark Arenberg 10, Katholieke Univ. Leuven, Leuven
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
4713
Lastpage :
4716
Abstract :
Non-negative matrix factorisation (NMF) is an unsupervised learning technique that decomposes a non-negative data matrix into a product of two lower rank non-negative matrices. The non-negativity constraint results in a parts-based and often sparse representation of the data. We use NMF to factorise a matrix with spectral slices of continuous speech to automatically find a feature set for speech recognition. The resulting decomposition yields a filter bank design with remarkable similarities to perceptually motivated designs, supporting the hypothesis that human hearing and speech production are well matched to each other. We point out that the divergence cost criterion used by NMF is linearly dependent on energy, which may influence the design. We will however argue that this does not significantly affect the interpretation of our results. Furthermore, we compare our filter bank with several hearing models found in literature. Evaluating the filter bank for speech recognition shows that the same recognition performance is achieved as with classical MEL- based features.
Keywords :
channel bank filters; matrix decomposition; speech processing; speech recognition; unsupervised learning; auditory filter banks; continuous speech; hearing models; nonnegative data matrix; nonnegative matrix factorisation; nonnegativity constraint; speech recognition; unsupervised learning; Auditory system; Channel bank filters; Filter bank; Humans; Matrix decomposition; Sparse matrices; Speech analysis; Speech recognition; Unsupervised learning; Vectors; Auditory system; Feature extraction; Non-negative matrix decomposition; Speech analysis; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518709
Filename :
4518709
Link To Document :
بازگشت