DocumentCode
1338075
Title
Diarization of Telephone Conversations Using Factor Analysis
Author
Kenny, Patrick ; Reynolds, Douglas ; Castaldo, Fabio
Author_Institution
Centre de Rech. Inf. de Montreal, Montreal, QC, Canada
Volume
4
Issue
6
fYear
2010
Firstpage
1059
Lastpage
1070
Abstract
We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).
Keywords
Bayes methods; pattern clustering; radiotelephony; speaker recognition; Bayes approach; baseline agglomerative clustering system; factor analysis; speaker change point detection; speaker clustering; speaker diarization; speaker recognition; telephone conversations; Adaptation model; Bayesian methods; Clustering methods; Hidden Markov models; Speaker recognition; Speech; Channel factors; clustering; diarization; speaker factors; speaker recognition; speaker segmentation; variational Bayes;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2010.2081790
Filename
5587872
Link To Document