Title :
Model-based speech/non-speech segmentation of a heterogeneous multilingual TV broadcast collection
Author :
Desplanques, Brecht ; Martens, Jean-Pierre
Author_Institution :
ELIS Multimedia Lab., Ghent Univ. - iMinds, Ghent, Belgium
Abstract :
Multimedia Information Retrieval systems normally comprise a preprocessor that performs a speech/non-speech (SNS) segmentation of the audio stream. The goal of such a segmentation is to divide the audio into intervals that need a lexical transcription and intervals that just need some categorization in terms of jingle, applause, etc. In this paper a baseline SNS system that was trained on monolingual BN data is evaluated on a multilingual BN corpus and on a heterogeneous corpus, composed of diverse TV shows including discussions, soaps, animation films, etc. It appears that the system exhibits serious deficiencies when confronted with such out-of-domain data. Especially the heterogeneous corpus, characterized by many short speaker turns and a rich pallet of non-speech intervals, turns out to be challenging. However, employing a proper SNS information criterion, it is demonstrated that enhancing the acoustic representation of the audio, creating a richer music model and performing a file-wise adaptation of the acoustic models can significantly increase the performance. Complex architectures permitting explicit duration modeling and re-segmentation of the speech parts after speaker change detection on the other hand do not seem to help.
Keywords :
acoustic signal processing; audio acoustics; audio signal processing; audio streaming; information retrieval; multimedia systems; signal representation; speech processing; television broadcasting; SNS information criterion; acoustic models; acoustic representation; audio stream; heterogeneous corpus; heterogeneous multilingual TV broadcast collection; lexical transcription; model-based speech-nonspeech segmentation; multimedia information retrieval systems; speaker change detection; Acoustics; Adaptation models; Computational modeling; Data models; Hidden Markov models; Speech; TV;
Conference_Titel :
Intelligent Signal Processing and Communications Systems (ISPACS), 2013 International Symposium on
Conference_Location :
Naha
Print_ISBN :
978-1-4673-6360-0
DOI :
10.1109/ISPACS.2013.6704522