DocumentCode :
178248
Title :
Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks
Author :
Marchi, Erik ; Ferroni, Giacomo ; Eyben, Florian ; Gabrielli, Leonardo ; Squartini, Stefano ; Schuller, Bjorn
Author_Institution :
Machine Intell. & Signal Process. Group, Tech. Univ. Munchen, München, Germany
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
2164
Lastpage :
2168
Abstract :
A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.
Keywords :
audio signal processing; discrete wavelet transforms; filtering theory; prediction theory; recurrent neural nets; signal detection; signal resolution; audio onset detection methods; auditory spectral features; bidirectional LSTM neural networks; bidirectional long short-term memory recurrent neural network; discrete wavelet transform; forward prediction errors; linear prediction filtering; multi-resolution linear prediction based features; non percussive music; reduction function; time resolution; wavelet coefficients; Conferences; Discrete wavelet transforms; Feature extraction; Neural networks; Speech; Speech processing; Audio Onset Detection; Bidirectional LongShort Term Memory; Discrete Wavelet Transform; Linear Prediction; Neural Networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853982
Filename :
6853982
Link To Document :
بازگشت