• DocumentCode
    3585546
  • Title

    A Cluster Purification Algorithm for Speaker Diarization System

  • Author

    Zhang Xiang

  • Author_Institution
    Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
  • Volume
    2
  • fYear
    2014
  • Firstpage
    538
  • Lastpage
    541
  • Abstract
    In speaker diarization system, it´s common to use bottom-up clustering method where the input data is first split in small pieces and then merged the most similar segments until reaching a stopping point. However, it´s not easy to ensure that every selection merges the right pair, and these errors tend to deteriorate the post-merging results. In this paper, a fast cluster purification algorithm is introduced after the size of clusters reach to a pre-estimated number K, which is no less than the real number of speakers involved in the conversation, and try to remedy the errors by removing the inappropriate segments into the right cluster. An effective way to estimate the number of speakers is also introduced before the clustering stage. The experiment results show improvement in both the purity of cluster and the diarization error rate (DER) after using the purification algorithm. The purity improves 0.8% and DER reduces 1.14% in average.
  • Keywords
    error statistics; pattern clustering; speaker recognition; DER; cluster purification algorithm; cluster segments; clusters size; diarization error rate; segment shuffling; speaker diarization system; speaker number estimation; speakers conversation; Clustering algorithms; Conferences; Density estimation robust algorithm; Measurement; Purification; Speech; Speech processing; cluster purification; segment shuffling; speaker diarization; speaker number estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Design (ISCID), 2014 Seventh International Symposium on
  • Print_ISBN
    978-1-4799-7004-9
  • Type

    conf

  • DOI
    10.1109/ISCID.2014.129
  • Filename
    7082048