مرکز منطقه ای اطلاع رساني علوم و فناوري - An Information Theoretic Approach to Speaker Diarization of Meeting Data

DocumentCode :

1135747

Title :

An Information Theoretic Approach to Speaker Diarization of Meeting Data

Author :

Vijayasenan, Deepu ; Valente, Fabio ; Bourlard, Hervé

Author_Institution :

Idiap Res. Inst., Martigny, Switzerland

Volume :

Issue :

fYear :

2009

Firstpage :

1382

Lastpage :

1393

Abstract :

A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.

Keywords :

hidden Markov models; speaker recognition; GMM based system; HMM; Jensen-Shannon divergence; diarization error rate; information bottleneck principle; information theoretic approach; objective function optimization; rich transcription; speaker diarization system; speech segments; Acoustic distortion; Automatic speech recognition; Error analysis; Hidden Markov models; Indexing; Loudspeakers; Mutual information; NIST; Streaming media; Vocabulary; Information bottleneck (IB); meetings data; speaker diarization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2015698

Filename :

5165121

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1135747