DocumentCode :
1574194
Title :
Advanced approaches to speaker diarization of audio documents
Author :
Markov, Konstantin
Author_Institution :
Human Interface Lab., Univ. of Aizu, Aizu-Wakamatsu, Japan
fYear :
2009
Firstpage :
179
Lastpage :
184
Abstract :
Speaker diarization is the process of annotating an audio document with information about the speaker identity of speech segments along with their start and end time. Assuming that audio input consists of speech only or that non-speech segments have been already identified by another method, the task of speaker diarization is to find ¿who spoke when¿. Since there is no prior information about the number of speakers, the main approach is to apply segment clustering. According to the clustering algorithm used, speaker diarization systems can be divided into two groups: (1) based on agglomerative clustering, and (2) based on on-line clustering. Agglomerative clustering is an off-line approach and is used in most of the current systems because it gives accurate results and can be fine tuned by performing several processing passes over the data. This, however, comes at the cost of high computational load which increases exponentially with the number of segments and the requirement of having the whole audio document available in advance. In contrast, on-line clustering based systems have almost constant computational load, work on-line in real time with small latency, but are generally less accurate than off-line systems. As we show in this paper, when using advanced on-line learning methods and original design, on-line systems can make less errors than off-line systems and can even work faster than real time with very low latency.
Keywords :
audio databases; pattern clustering; speaker recognition; speech processing; agglomerative clustering; audio documents; latency; speaker diarization; speech segmentation; Broadcasting; Clustering algorithms; Computational efficiency; Computer science; Delay; Humans; Loudspeakers; Natural languages; Real time systems; Speech processing; On-line GMM learning; Speaker diarization; Speaker segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pervasive Computing (JCPC), 2009 Joint Conferences on
Conference_Location :
Tamsui, Taipei
Print_ISBN :
978-1-4244-5227-9
Electronic_ISBN :
978-1-4244-5228-6
Type :
conf
DOI :
10.1109/JCPC.2009.5420194
Filename :
5420194
Link To Document :
بازگشت