• DocumentCode
    1754808
  • Title

    Codebook-Based Audio Feature Representation for Music Information Retrieval

  • Author

    Vaizman, Yonatan ; McFee, Brian ; Lanckriet, Gert

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of California, San Diego, La Jolla, CA, USA
  • Volume
    22
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    1483
  • Lastpage
    1493
  • Abstract
    Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on content based methods. Besides powerful machine learning tools for classification and retrieval, a key component for successful recommendation is the audio content representation. Good representations should capture informative musical patterns in the audio signal of songs. These representations should be concise, to enable efficient (low storage, easy indexing, fast search) management of huge music repositories, and should also be easy and fast to compute, to enable real-time interaction with a user supplying new songs to the system. Before designing new audio features, we explore the usage of traditional local features, while adding a stage of encoding with a pre-computed codebook and a stage of pooling to get compact vectorial representations. We experiment with different encoding methods, namely the LASSO, vector quantization (VQ) and cosine similarity (CS). We evaluate the representations´ quality in two music information retrieval applications: query-by-tag and query-by-example. Our results show that concise representations can be used for successful performance in both applications. We recommend using top- τ VQ encoding, which consistently performs well in both applications, and requires much less computation time than the LASSO.
  • Keywords
    audio coding; content-based retrieval; feature extraction; indexing; learning (artificial intelligence); music; query processing; recommender systems; vector quantisation; LASSO; audio content representation; audio encoding; automated recommendation systems; codebook-based audio feature representation; content based methods; cosine similarity; digital music; informative musical patterns; machine learning tools; manual annotations; music discovery; music information retrieval; music repository management; precomputed codebook; query-by-example; query-by-tag; real-time user interaction; song audio signal; top- τ VQ encoding; user preference data; vector quantization; vectorial representations; Dictionaries; Encoding; Hidden Markov models; Speech; Speech processing; Training; Vectors; Audio content representations; music information retrieval; music recommendation; sparse coding; vector quantization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2337842
  • Filename
    6851913