DocumentCode
1161055
Title
Multistage speaker diarization of broadcast news
Author
Barras, Claude ; Zhu, Xuan ; Meignier, Sylvain ; Gauvain, Jean-Luc
Author_Institution
Eng. Sci.-Nat. Center for Sci. Res., LIMSI-CNRS, Orsay
Volume
14
Issue
5
fYear
2006
Firstpage
1505
Lastpage
1512
Abstract
This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system
Keywords
Bayes methods; Gaussian processes; broadcasting; iterative methods; pattern clustering; speaker recognition; Bayesian information criterion agglomerative clustering; ESTER evaluation data; LIMSI broadcast news transcription system; National Institute of Standards and Technology RT-04F; baseline audio partitioner; clustering system; high cluster purity; iterative Gaussian mixture model clustering; multistage segmentation; multistage speaker diarization; segment boundaries; speaker error reduction; speaker identification; split data; Background noise; Bayesian methods; Broadcasting; Computer errors; Indexing; Laboratories; Loudspeakers; NIST; Speech processing; Streaming media; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID); speaker segmentation and clustering;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2006.878261
Filename
1677972
Link To Document