• DocumentCode
    3425542
  • Title

    Unsupervised learning of auditory filter banks using non-negative matrix factorisation

  • Author

    Bertrand, Alexander ; Demuynck, Kris ; Stouten, Veronique ; Hamme, Hugo Van

  • Author_Institution
    Dept. ESAT Kasteelpark Arenberg 10, Katholieke Univ. Leuven, Leuven
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4713
  • Lastpage
    4716
  • Abstract
    Non-negative matrix factorisation (NMF) is an unsupervised learning technique that decomposes a non-negative data matrix into a product of two lower rank non-negative matrices. The non-negativity constraint results in a parts-based and often sparse representation of the data. We use NMF to factorise a matrix with spectral slices of continuous speech to automatically find a feature set for speech recognition. The resulting decomposition yields a filter bank design with remarkable similarities to perceptually motivated designs, supporting the hypothesis that human hearing and speech production are well matched to each other. We point out that the divergence cost criterion used by NMF is linearly dependent on energy, which may influence the design. We will however argue that this does not significantly affect the interpretation of our results. Furthermore, we compare our filter bank with several hearing models found in literature. Evaluating the filter bank for speech recognition shows that the same recognition performance is achieved as with classical MEL- based features.
  • Keywords
    channel bank filters; matrix decomposition; speech processing; speech recognition; unsupervised learning; auditory filter banks; continuous speech; hearing models; nonnegative data matrix; nonnegative matrix factorisation; nonnegativity constraint; speech recognition; unsupervised learning; Auditory system; Channel bank filters; Filter bank; Humans; Matrix decomposition; Sparse matrices; Speech analysis; Speech recognition; Unsupervised learning; Vectors; Auditory system; Feature extraction; Non-negative matrix decomposition; Speech analysis; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518709
  • Filename
    4518709