DocumentCode :
1264783
Title :
Sparse Approximations for Drum Sound Classification
Author :
Scholler, Simon ; Purwins, Hendrik
Author_Institution :
Music Technol. Group, Univ. Pompeu Fabra, Barcelona, Spain
Volume :
5
Issue :
5
fYear :
2011
Firstpage :
933
Lastpage :
940
Abstract :
Up to now, there has only been little work on using features from temporal approximations of signals for audio recognition. Time-frequency tradeoffs are an important issue in signal processing; sparse representations using overcomplete dictionaries may (or may not, depending on the dictionary) have more time-frequency flexibility than standard short-time Fourier transform. Also, the precise temporal structure of signals cannot be captured by spectral-based feature methods. Here, we present a biologically inspired three-step process for audio classification: 1) Efficient atomic functions are learned in an unsupervised manner on mixtures of percussion sounds (drum phrases), optimizing the length as well as the shape of the atoms. 2) An analog spike model is used to sparsely approximate percussion sound signals (bass drum, snare drum, hi-hat). The spike model consists of temporally shifted versions of the learned atomic functions, each having a precise temporal position and amplitude. To obtain the decomposition given a set of atomic functions, matching pursuit is used. 3) Features are extracted from the resulting spike representation of the signal. The classification accuracy of our method using a support vector machine (SVM) in a 3-class database transfer task is 87.8%. Using gammatone functions instead of the learned sparse functions yields an even better classification rate of 97.6%. Testing the features on sounds containing additive white Gaussian noise reveals that sparse approximation features are far more robust to such distortions than our benchmark feature set of timbre descriptor (TD) features.
Keywords :
AWGN; audio signal processing; dictionaries; feature extraction; musical instruments; signal classification; support vector machines; unsupervised learning; 3-class database transfer task; SVM; additive white Gaussian noise; analog spike model; atomic function; audio classification; audio recognition; bass drum; drum phrase; drum sound classification; feature extraction; gammatone function; hi-hat drum; matching pursuit; overcomplete dictionary; percussion sound signal; signal processing; snare drum; sparse approximation; sparse representation; support vector machine; timbre descriptor; time-frequency tradeoff; unsupervised learning; Approximation methods; Atomic clocks; Databases; Dictionaries; Encoding; Matching pursuit algorithms; Signal to noise ratio; Dictionary learning; machine listening; matching pursuit; sound classification; sparse approximation; spike coding; unsupervised learning;
fLanguage :
English
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
Publisher :
ieee
ISSN :
1932-4553
Type :
jour
DOI :
10.1109/JSTSP.2011.2161264
Filename :
5940201
Link To Document :
بازگشت