DocumentCode
1214085
Title
Audio thumbnailing of popular music using chroma-based representations
Author
Bartsch, Mark A. ; Wakefield, Gregory H.
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
Volume
7
Issue
1
fYear
2005
Firstpage
96
Lastpage
104
Abstract
With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions.
Keywords
audio databases; audio signal processing; cepstral analysis; feature extraction; multimedia databases; music; query processing; time-frequency analysis; audio summarization; audio thumbnailing; chroma-based representations; chromagram; database search; feature extraction; multimedia content; musical structure; pitch perception; popular music; structure-based pattern recognition; time-frequency distributions; Costs; Feature extraction; Multimedia databases; Multimedia systems; Multiple signal classification; Pattern recognition; Redundancy; Spatial databases; Speech; Time frequency analysis;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2004.840597
Filename
1386245
Link To Document