Codebook-Based Audio Feature Representation for Music Information Retrieval

Author

Vaizman, Yonatan ; McFee, Brian ; Lanckriet, Gert

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of California, San Diego, La Jolla, CA, USA

Volume

22

Issue

10

fYear

2014

fDate

Oct. 2014

Firstpage

1483

Lastpage

1493

Abstract

Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on content based methods. Besides powerful machine learning tools for classification and retrieval, a key component for successful recommendation is the audio content representation. Good representations should capture informative musical patterns in the audio signal of songs. These representations should be concise, to enable efficient (low storage, easy indexing, fast search) management of huge music repositories, and should also be easy and fast to compute, to enable real-time interaction with a user supplying new songs to the system. Before designing new audio features, we explore the usage of traditional local features, while adding a stage of encoding with a pre-computed codebook and a stage of pooling to get compact vectorial representations. We experiment with different encoding methods, namely the LASSO, vector quantization (VQ) and cosine similarity (CS). We evaluate the representations´ quality in two music information retrieval applications: query-by-tag and query-by-example. Our results show that concise representations can be used for successful performance in both applications. We recommend using top- τ VQ encoding, which consistently performs well in both applications, and requires much less computation time than the LASSO.

Keywords

audio coding; content-based retrieval; feature extraction; indexing; learning (artificial intelligence); music; query processing; recommender systems; vector quantisation; LASSO; audio content representation; audio encoding; automated recommendation systems; codebook-based audio feature representation; content based methods; cosine similarity; digital music; informative musical patterns; machine learning tools; manual annotations; music discovery; music information retrieval; music repository management; precomputed codebook; query-by-example; query-by-tag; real-time user interaction; song audio signal; top- τ VQ encoding; user preference data; vector quantization; vectorial representations; Dictionaries; Encoding; Hidden Markov models; Speech; Speech processing; Training; Vectors; Audio content representations; music information retrieval; music recommendation; sparse coding; vector quantization;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2337842

Filename

6851913