• DocumentCode
    3528166
  • Title

    Fusing short term and long term features for improved speaker diarization

  • Author

    Friedland, A. Gerald ; Vinyals, B. Oriol ; Huang, C. Yan ; Müller, D. Christian

  • Author_Institution
    Intern-l Comput. Sci. Inst., Berkeley, CA
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4077
  • Lastpage
    4080
  • Abstract
    The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized data sets (NIST RT) and show a consistent improvement of about 30% relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007. This result was also verified on a wide set of meetings, which we call CombDev, that contains 21 meetings from previous evaluations. Since the prosodic and long-term features were selected using a diarization-independent speaker-discriminability study, we are confident that the same features are able to improve other systems that perform similar tasks.
  • Keywords
    feature extraction; speaker recognition; CombDev; MFCC; diarization error rate; diarization-independent speaker-discriminability; long-term features; speaker diarization; speaker discriminability; Audio recording; Cepstral analysis; Clustering algorithms; Error analysis; Feature extraction; Mel frequency cepstral coefficient; NIST; Speaker recognition; Speech; Testing; Long-Term Features; Prosody; Speaker Diarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960524
  • Filename
    4960524