Title :
Audio thumbnailing of popular music using chroma-based representations
Author :
Bartsch, Mark A. ; Wakefield, Gregory H.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions.
Keywords :
audio databases; audio signal processing; cepstral analysis; feature extraction; multimedia databases; music; query processing; time-frequency analysis; audio summarization; audio thumbnailing; chroma-based representations; chromagram; database search; feature extraction; multimedia content; musical structure; pitch perception; popular music; structure-based pattern recognition; time-frequency distributions; Costs; Feature extraction; Multimedia databases; Multimedia systems; Multiple signal classification; Pattern recognition; Redundancy; Spatial databases; Speech; Time frequency analysis;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2004.840597