• DocumentCode
    1756801
  • Title

    Learning Optimal Features for Polyphonic Audio-to-Score Alignment

  • Author

    Joder, Cyril ; Essid, Slim ; Richard, Guilhem

  • Author_Institution
    Inst. for Human-Machine Commun., Tech. Univ. Munich, Munich, Germany
  • Volume
    21
  • Issue
    10
  • fYear
    2013
  • fDate
    Oct. 2013
  • Firstpage
    2118
  • Lastpage
    2128
  • Abstract
    This paper addresses the design of feature functions for the matching of a musical recording to the symbolic representation of the piece (the score). These feature functions are defined as dissimilarity measures between the audio observations and template vectors corresponding to the score. By expressing the template construction as a linear mapping from the symbolic to the audio representation, one can learn the feature functions by optimizing the linear transformation. In this paper, we explore two different learning strategies. The first one uses a best-fit criterion (minimum divergence), while the second one exploits a discriminative framework based on a Conditional Random Fields model (maximum likelihood criterion). We evaluate the influence of the feature functions in an audio-to-score alignment task, on a large database of popular and classical polyphonic music. The results show that with several types of models, using different temporal constraints, the learned mappings have the potential to outperform the classic heuristic mappings. Several representations of the audio observations, along with several distance functions are compared in this alignment task. Our experiments elect the symmetric Kullback-Leibler divergence. Moreover, both the spectrogram and a CQT-based representation turn out to provide very accurate alignments, detecting more than 97% of the onsets with a precision of 100 ms with our most complex system.
  • Keywords
    audio signal processing; maximum likelihood estimation; signal representation; CQT-based representation; audio observations; conditional random fields model; discriminative framework; feature functions design; heuristic mappings; learning optimal features; linear transformation; maximum likelihood criterion; musical recording; polyphonic audio-to-score alignment; polyphonic music; spectrogram; symbolic representation; symmetric Kull-back-Leibler divergence; template construction; template vectors; temporal constraints; Music information retrieval; audio-to-score alignment; conditional random fields; discriminative learning;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2266794
  • Filename
    6525340