DocumentCode
180128
Title
End-to-end learning for music audio
Author
Dieleman, Sander ; Schrauwen, Benjamin
Author_Institution
Electron. & Inf. Syst. Dept., Ghent Univ., Ghent, Belgium
fYear
2014
fDate
4-9 May 2014
Firstpage
6964
Lastpage
6968
Abstract
Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
Keywords
content-based retrieval; learning (artificial intelligence); music; automatic tagging task; content-based music information retrieval tasks; convolutional neural networks training; end-to-end learning; frequency decompositions; music audio; phase-and translation-invariant feature representations; raw audio; spectrogram-based approach; Computer architecture; Convolution; Music information retrieval; Neural networks; Spectrogram; Speech; automatic tagging; convolutional neural networks; end-to-end learning; feature learning; music information retrieval;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854950
Filename
6854950
Link To Document