DocumentCode :
1427799
Title :
Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data
Author :
Nwe, Tin Lay ; Sun, Hanwu ; Ma, Bin ; Li, Haizhou
Author_Institution :
Human Language Technol. Dept., A*STAR, Singapore, Singapore
Volume :
20
Issue :
2
fYear :
2012
Firstpage :
461
Lastpage :
473
Abstract :
This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two important steps in a diarization process. The first step is Initial Segmentation and Clustering (ISC), and the second one is cluster merging and purification. In the ISC step, we propose a histogram quantization and clustering technique based on time delay of arrival (TDOA) features by analyzing the correlation among the signals across multiple distant microphones. In the cluster merging and purification step, we further merge the speaker clusters using a Bayesian information criterion (BIC) to consolidate the clusters to arrive at one-cluster-per-speaker. The two steps work in tandem to form an integral process. We propose a novel Consensus Based Cluster Purification (CBCP) method that involves a technique to remove impure speaker segments in the speaker clusters before speaker modeling in the cluster purification process. The system reports a state-of-the-art performance of speaker diarization for RT07 and RT09 MDM condition with 7.47% and 8.77% Diarization error rates (DERs), respectively, for both overlapping and non-overlapping speech.
Keywords :
Bayes methods; correlation methods; feature extraction; microphone arrays; pattern clustering; speaker recognition; time-of-arrival estimation; BIC; Bayesian information criterion; CBCP method; NIST Rich Transcription Meeting Recognition Evaluations; RT07 evaluation meeting data; RT09 evaluation meeting data; TDOA features; cluster merging; consensus based cluster purification method; diarization error rates; histogram quantization technique; initial segmentation and clustering; multiple distant microphone condition; one-cluster-per-speaker technique; signal correlation; speaker clustering; speaker diarization system; speaker modeling; time delay of arrival features; Histograms; Mel frequency cepstral coefficient; Merging; Microphones; NIST; Quantization; Speech; Clustering methods; delay estimation; expert systems; meeting audio;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2159203
Filename :
6136543
Link To Document :
بازگشت