• DocumentCode
    1161055
  • Title

    Multistage speaker diarization of broadcast news

  • Author

    Barras, Claude ; Zhu, Xuan ; Meignier, Sylvain ; Gauvain, Jean-Luc

  • Author_Institution
    Eng. Sci.-Nat. Center for Sci. Res., LIMSI-CNRS, Orsay
  • Volume
    14
  • Issue
    5
  • fYear
    2006
  • Firstpage
    1505
  • Lastpage
    1512
  • Abstract
    This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system
  • Keywords
    Bayes methods; Gaussian processes; broadcasting; iterative methods; pattern clustering; speaker recognition; Bayesian information criterion agglomerative clustering; ESTER evaluation data; LIMSI broadcast news transcription system; National Institute of Standards and Technology RT-04F; baseline audio partitioner; clustering system; high cluster purity; iterative Gaussian mixture model clustering; multistage segmentation; multistage speaker diarization; segment boundaries; speaker error reduction; speaker identification; split data; Background noise; Bayesian methods; Broadcasting; Computer errors; Indexing; Laboratories; Loudspeakers; NIST; Speech processing; Streaming media; Bayesian information criterion (BIC) clustering; speaker diarization; speaker identification (SID); speaker segmentation and clustering;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.878261
  • Filename
    1677972