• DocumentCode
    3605834
  • Title

    Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling

  • Author

    Delgado, Hector ; Anguera, Xavier ; Fredouille, Corinne ; Serrano, Javier

  • Author_Institution
    Dept. of Telecommun. & Syst. Eng., Autonomous Univ. of Barcelona, Cerdanyola, Spain
  • Volume
    23
  • Issue
    12
  • fYear
    2015
  • Firstpage
    2286
  • Lastpage
    2297
  • Abstract
    Speaker diarization has become a key process within other speech processing systems which take advantage of single-speaker speech signals. Furthermore, finding recurrent speakers among a set of audio recordings, known as cross-show diarization, is gaining attention in the last years. Current state-of-the-art-systems provide good performance, but usually at the cost of long processing times. This limitation may make current systems not suitable for real-life applications. In this line, the speaker diarization approach based on binary key modeling provides a very fast yet accurate alternative. In this paper, we present the last improvements applied in binary key speaker diarization with the aim of further speeding up the process and improving performance. In addition, we propose a novel method for cross-show speaker diarization based on binary keys. Experimental results show the effectiveness of the proposed improvements for single-show speaker diarization, both in terms of speed and performance, obtaining a real-time factor of 0.0354xRT and a 16.8% relative improvement in performance. Furthermore, our proposed cross-show approach provides very competitive performance, just slightly worse than its single-show diarization counterpart, and exhibits a real time factor of 0.04xRT.
  • Keywords
    audio recording; speaker recognition; speech processing; audio recordings; binary key modeling; binary key speaker diarization; binary key speaker modeling; cross-show speaker diarization; fast single-show speaker diarization; performance improvement; single-speaker speech signals; speech processing systems; Acoustic measurements; Data models; Density estimation robust algorithm; Real-time systems; Speech processing; Training; Binary key speaker modeling; cross-show speaker diarization; speaker diarization; within-class sum of squares;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2479043
  • Filename
    7268861