مرکز منطقه ای اطلاع رساني علوم و فناوري - Prosodic and other Long-Term Features for Speaker Diarization

DocumentCode :

1063190

Title :

Prosodic and other Long-Term Features for Speaker Diarization

Author :

Friedland, Gerald ; Vinyals, Oriol ; Huang, Yan ; Müller, Christian

Author_Institution :

Int. Comput. Sci. Inst., Berkeley, CA

Volume :

Issue :

fYear :

2009

fDate :

7/1/2009 12:00:00 AM

Firstpage :

985

Lastpage :

993

Abstract :

Speaker diarization is defined as the task of determining ldquowho spoke whenrdquo given an audio track and no other prior knowledge of any kind. The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized datasets (NIST RT) and show a consistent improvement of about 30% relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007.

Keywords :

audio signal processing; cepstral analysis; MFCC; audio track; long-term features; mel-frequency cepstral coefficients; speaker diarization; speaker discriminability; Cepstral analysis; Computer science; Density estimation robust algorithm; Error analysis; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech analysis; Speech processing; System testing; Long-term features; prosody; speaker diarization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2015089

Filename :

5067417

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1063190