DocumentCode :
27697
Title :
A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval
Author :
Li Su ; Yeh, Chin-Chia Michael ; Jen-Yu Liu ; Ju-Chiang Wang ; Yi-Hsuan Yang
Author_Institution :
Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan
Volume :
16
Issue :
5
fYear :
2014
fDate :
Aug. 2014
Firstpage :
1188
Lastpage :
1200
Abstract :
There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.
Keywords :
information retrieval; music; unsupervised learning; BoF model; MIR; audio codewords; bag-of-frames representation; feature representations; music information modeling; music information retrieval problems; power normalization; tf-idf weighting; unsupervised feature learning; Frequency measurement; Matching pursuit algorithms; Mel frequency cepstral coefficient; Multiple signal classification; Music information retrieval; Spectrogram; Training; Bag-of-frames model; music information retrieval; sparse coding; unsupervised feature learning;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2014.2311016
Filename :
6763025
Link To Document :
بازگشت