• DocumentCode
    1214085
  • Title

    Audio thumbnailing of popular music using chroma-based representations

  • Author

    Bartsch, Mark A. ; Wakefield, Gregory H.

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
  • Volume
    7
  • Issue
    1
  • fYear
    2005
  • Firstpage
    96
  • Lastpage
    104
  • Abstract
    With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions.
  • Keywords
    audio databases; audio signal processing; cepstral analysis; feature extraction; multimedia databases; music; query processing; time-frequency analysis; audio summarization; audio thumbnailing; chroma-based representations; chromagram; database search; feature extraction; multimedia content; musical structure; pitch perception; popular music; structure-based pattern recognition; time-frequency distributions; Costs; Feature extraction; Multimedia databases; Multimedia systems; Multiple signal classification; Pattern recognition; Redundancy; Spatial databases; Speech; Time frequency analysis;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2004.840597
  • Filename
    1386245