DocumentCode :
2446023
Title :
Robustness Improvement of Speaker Segmentation techniques Based on the Bayesian Information Criterion
Author :
Kadri, Hachem ; Lachiri, Zied ; Ellouze, Noureddine
Volume :
1
fYear :
2006
fDate :
24-28 April 2006
Firstpage :
1300
Lastpage :
1301
Abstract :
Speaker segmentation is the problem of finding speaker segment boundaries when a speaker begins and stop speaking in an audio speaker stream. This segmentation of audio data is of interest to a broad class of applications like surveillance meetings summarization or indexing of broadcast news. Unsupervised speaker segmentation approaches suppose that there is no information about the speakers and their number is known a priori. It can be classed into three categories: energy-based segmentation, metric-based selection and model-selection-based segmentation. Energy-based segmentation: silence in the input audio stream is detected either by a decoder or directly by measuring and thresholding the audio energy. The segments are then generated by cutting the input at silence locations. • Metric-based segmentation: the audio stream is segmented at maxima of the distances between neighboring windows placed in evenly spaced time intervals. • Model-selection-based segmentation [4]: assuming that data are generated by a Gaussian process, speaker changes are detected by using a statistical decision criterion within a sliding window through the audio stream. A widely used technique for speaker segmentation is based on the Bayesian Information Criterion (BIC). Indeed, BIC segmentation presents the advantages of robustness and threshold independence. However, this method, extremely computationally expensive, can introduce an estimation error due to insufficient data when the speaker turns are close to each other.
Keywords :
Bayesian Information Criterion; Hotelling´s T; Unsupervised audio segmentation; speaker change detection; Bayesian methods; Broadcasting; Decoding; Energy measurement; Estimation error; Gaussian processes; Indexing; Robustness; Streaming media; Surveillance; Bayesian Information Criterion; Hotelling´s T; Unsupervised audio segmentation; speaker change detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Print_ISBN :
0-7803-9521-2
Type :
conf
DOI :
10.1109/ICTTA.2006.1684567
Filename :
1684567
Link To Document :
بازگشت