• DocumentCode
    3585013
  • Title

    Background-tracking acoustic features for genre identification of broadcast shows

  • Author

    Saz, Oscar ; Doulaty, Mortaza ; Hain, Thomas

  • Author_Institution
    Dept. of Comput. Sci., Speech & Hearing Group, Univ. of Sheffield, Sheffield, UK
  • fYear
    2014
  • Firstpage
    118
  • Lastpage
    123
  • Abstract
    This paper presents a novel method for extracting acoustic features that characterise the background environment in audio recordings. These features are based on the output of an alignment that fits multiple parallel background-based Constrained Maximum Likelihood Linear Regression transformations asynchronously to the input audio signal. With this setup, the resulting features can track changes in the audio background like appearance and disappearance of music, applause or laughter, independently of the speakers in the foreground of the audio. The ability to provide this type of acoustic description in audiovisual data has many potential applications, including automatic classification of broadcast archives or improving automatic transcription and subtitling. In this paper, the performance of these features in a genre identification task in a set of 332 BBC shows is explored. The proposed background-tracking features outperform short-term Perceptual Linear Prediction features in this task using Gaussian Mixture Model classifiers (62% vs 72% accuracy). The use of more complex classifiers, Hidden Markov Models and Support Vector Machines, increases the performance of the system with the novel background-tracking features to 79% and 81% in accuracy respectively.
  • Keywords
    Gaussian processes; acoustic signal processing; audio signal processing; audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; mixture models; regression analysis; signal classification; support vector machines; 332 BBC show; Gaussian mixture model classifier; acoustic description; audio recording; audiovisual data; background-tracking acoustic feature extraction; broadcast show; genre identification; hidden Markov model; multiple parallel background-based constrained maximum likelihood linear regression transformation; short-term perceptual linear prediction; support vector machine; Abstracts; Acoustics; Biological system modeling; Hidden Markov models; Indexes; Acoustic background; broadcast data; genre identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2014 IEEE
  • Type

    conf

  • DOI
    10.1109/SLT.2014.7078560
  • Filename
    7078560