• DocumentCode
    669783
  • Title

    Model-based speech/non-speech segmentation of a heterogeneous multilingual TV broadcast collection

  • Author

    Desplanques, Brecht ; Martens, Jean-Pierre

  • Author_Institution
    ELIS Multimedia Lab., Ghent Univ. - iMinds, Ghent, Belgium
  • fYear
    2013
  • fDate
    12-15 Nov. 2013
  • Firstpage
    55
  • Lastpage
    60
  • Abstract
    Multimedia Information Retrieval systems normally comprise a preprocessor that performs a speech/non-speech (SNS) segmentation of the audio stream. The goal of such a segmentation is to divide the audio into intervals that need a lexical transcription and intervals that just need some categorization in terms of jingle, applause, etc. In this paper a baseline SNS system that was trained on monolingual BN data is evaluated on a multilingual BN corpus and on a heterogeneous corpus, composed of diverse TV shows including discussions, soaps, animation films, etc. It appears that the system exhibits serious deficiencies when confronted with such out-of-domain data. Especially the heterogeneous corpus, characterized by many short speaker turns and a rich pallet of non-speech intervals, turns out to be challenging. However, employing a proper SNS information criterion, it is demonstrated that enhancing the acoustic representation of the audio, creating a richer music model and performing a file-wise adaptation of the acoustic models can significantly increase the performance. Complex architectures permitting explicit duration modeling and re-segmentation of the speech parts after speaker change detection on the other hand do not seem to help.
  • Keywords
    acoustic signal processing; audio acoustics; audio signal processing; audio streaming; information retrieval; multimedia systems; signal representation; speech processing; television broadcasting; SNS information criterion; acoustic models; acoustic representation; audio stream; heterogeneous corpus; heterogeneous multilingual TV broadcast collection; lexical transcription; model-based speech-nonspeech segmentation; multimedia information retrieval systems; speaker change detection; Acoustics; Adaptation models; Computational modeling; Data models; Hidden Markov models; Speech; TV;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Signal Processing and Communications Systems (ISPACS), 2013 International Symposium on
  • Conference_Location
    Naha
  • Print_ISBN
    978-1-4673-6360-0
  • Type

    conf

  • DOI
    10.1109/ISPACS.2013.6704522
  • Filename
    6704522