DocumentCode :
1426664
Title :
A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization
Author :
Evans, Nicholas ; Bozonnet, Simon ; Wang, Dong ; Fredouille, Corinne ; Troncy, Raphaël
Author_Institution :
Dept. of Multimedia Commun., EURECOM, Sophia Antipolis, France
Volume :
20
Issue :
2
fYear :
2012
Firstpage :
382
Lastpage :
392
Abstract :
This paper presents a theoretical framework to analyze the relative merits of the two most general, dominant approaches to speaker diarization involving bottom-up and top-down hierarchical clustering. We present an original qualitative comparison which argues how the two approaches are likely to exhibit different behavior in speaker inventory optimization and model training: bottom-up approaches will capture comparatively purer models and will thus be more sensitive to nuisance variation such as that related to the speech content; top-down approaches, in contrast, will produce less discriminative speaker models but, importantly, models which are potentially better normalized against nuisance variation. We report experiments conducted on two standard, single-channel NIST RT evaluation datasets which validate our hypotheses. Results show that competitive performance can be achieved with both bottom-up and top-down approaches (average DERs of 21% and 22%), and that neither approach is superior. Speaker purification, which aims to improve speaker discrimination, gives more consistent improvements with the top-down system than with the bottom-up system (average DERs of 19% and 25%), thereby confirming that the top-down system is less discriminative and that the bottom-up system is less stable. Finally, we report a new combination strategy that exploits the merits of the two approaches. Combination delivers an average DER of 17% and confirms the intrinsic complementary of the two approaches.
Keywords :
pattern clustering; speaker recognition; bottom-up hierarchical clustering; nuisance variation; single-channel NIST RT evaluation datasets; speaker diarization; speaker discrimination; speaker inventory optimization; speaker model training; speaker purification; top-down hierarchical clustering; Acoustics; Data models; Hidden Markov models; Merging; NIST; Speech; Training; Clustering; rich transcription; segmentation; speaker diarization;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2159710
Filename :
6135545
Link To Document :
بازگشت