DocumentCode :
1756801
Title :
Learning Optimal Features for Polyphonic Audio-to-Score Alignment
Author :
Joder, Cyril ; Essid, Slim ; Richard, Guilhem
Author_Institution :
Inst. for Human-Machine Commun., Tech. Univ. Munich, Munich, Germany
Volume :
21
Issue :
10
fYear :
2013
fDate :
Oct. 2013
Firstpage :
2118
Lastpage :
2128
Abstract :
This paper addresses the design of feature functions for the matching of a musical recording to the symbolic representation of the piece (the score). These feature functions are defined as dissimilarity measures between the audio observations and template vectors corresponding to the score. By expressing the template construction as a linear mapping from the symbolic to the audio representation, one can learn the feature functions by optimizing the linear transformation. In this paper, we explore two different learning strategies. The first one uses a best-fit criterion (minimum divergence), while the second one exploits a discriminative framework based on a Conditional Random Fields model (maximum likelihood criterion). We evaluate the influence of the feature functions in an audio-to-score alignment task, on a large database of popular and classical polyphonic music. The results show that with several types of models, using different temporal constraints, the learned mappings have the potential to outperform the classic heuristic mappings. Several representations of the audio observations, along with several distance functions are compared in this alignment task. Our experiments elect the symmetric Kullback-Leibler divergence. Moreover, both the spectrogram and a CQT-based representation turn out to provide very accurate alignments, detecting more than 97% of the onsets with a precision of 100 ms with our most complex system.
Keywords :
audio signal processing; maximum likelihood estimation; signal representation; CQT-based representation; audio observations; conditional random fields model; discriminative framework; feature functions design; heuristic mappings; learning optimal features; linear transformation; maximum likelihood criterion; musical recording; polyphonic audio-to-score alignment; polyphonic music; spectrogram; symbolic representation; symmetric Kull-back-Leibler divergence; template construction; template vectors; temporal constraints; Music information retrieval; audio-to-score alignment; conditional random fields; discriminative learning;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2266794
Filename :
6525340
Link To Document :
بازگشت