Title :
Initial conditions for speaker diarization
Abstract :
We examine different initializations and their influence on the performances of iterative speaker diarization system. Six methods of initializations were under examination, starting with a naive frame based random initialization, continue with uniform conversation dividing between the clusters and ending with weighted segmental k-means. The initialization methods were tested on two telephone conversation databases: LDC America CallHome and NIST SRE-05. In contrast to most works on meeting and shows where the speakers turns are not very frequent and minimal duration constraints of 2.5 sec or more can be applied to capture speakers statistics, in telephone conversations the speaker turns are much more frequent and the minimum duration should be set to several hundreds of milliseconds. In such cases, good cluster initialization is very important. It will be shown that good initialization using weighted segmental k-means is outperforms all other methods, and the either fixed or minimum duration constraints can be minor, and even without any constraint on the segment duration the results are significantly better than in other initializations.
Keywords :
iterative methods; speaker recognition; statistical analysis; LDC America CallHome; NIST SRE-05; fixed duration constraints; iterative speaker diarization system; minimum duration constraints; naive frame based random initialization; speakers statistics; telephone conversation databases; telephone conversations; time 2.5 s; uniform conversation; weighted segmental k-means; Databases; Density estimation robust algorithm; Hidden Markov models; NIST; Speech; Training; Vectors; Speaker diarization; initialization; weighted segmental k-means (WSKMeans);
Conference_Titel :
Electrical & Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of
Conference_Location :
Eilat
Print_ISBN :
978-1-4673-4682-5
DOI :
10.1109/EEEI.2012.6376947