DocumentCode
3425542
Title
Unsupervised learning of auditory filter banks using non-negative matrix factorisation
Author
Bertrand, Alexander ; Demuynck, Kris ; Stouten, Veronique ; Hamme, Hugo Van
Author_Institution
Dept. ESAT Kasteelpark Arenberg 10, Katholieke Univ. Leuven, Leuven
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
4713
Lastpage
4716
Abstract
Non-negative matrix factorisation (NMF) is an unsupervised learning technique that decomposes a non-negative data matrix into a product of two lower rank non-negative matrices. The non-negativity constraint results in a parts-based and often sparse representation of the data. We use NMF to factorise a matrix with spectral slices of continuous speech to automatically find a feature set for speech recognition. The resulting decomposition yields a filter bank design with remarkable similarities to perceptually motivated designs, supporting the hypothesis that human hearing and speech production are well matched to each other. We point out that the divergence cost criterion used by NMF is linearly dependent on energy, which may influence the design. We will however argue that this does not significantly affect the interpretation of our results. Furthermore, we compare our filter bank with several hearing models found in literature. Evaluating the filter bank for speech recognition shows that the same recognition performance is achieved as with classical MEL- based features.
Keywords
channel bank filters; matrix decomposition; speech processing; speech recognition; unsupervised learning; auditory filter banks; continuous speech; hearing models; nonnegative data matrix; nonnegative matrix factorisation; nonnegativity constraint; speech recognition; unsupervised learning; Auditory system; Channel bank filters; Filter bank; Humans; Matrix decomposition; Sparse matrices; Speech analysis; Speech recognition; Unsupervised learning; Vectors; Auditory system; Feature extraction; Non-negative matrix decomposition; Speech analysis; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV
ISSN
1520-6149
Print_ISBN
978-1-4244-1483-3
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2008.4518709
Filename
4518709
Link To Document