• DocumentCode
    72434
  • Title

    Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

  • Author

    Shum, Stephen H. ; Dehak, Najim ; Dehak, Reda ; Glass, James R.

  • Author_Institution
    MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA
  • Volume
    21
  • Issue
    10
  • fYear
    2013
  • fDate
    Oct. 2013
  • Firstpage
    2015
  • Lastpage
    2028
  • Abstract
    In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.
  • Keywords
    Bayes methods; Gaussian processes; iterative methods; pattern clustering; principal component analysis; speaker recognition; Bayesian Gaussian mixture model; Bayesian nonparametric approach; GMM; PCA-processed i-vector; integrated approach; iterative optimization scheme; multispeaker CallHome telephone corpus; principal component analysis; resegmentation algorithm; speaker cluster assignment; speaker clustering; speaker diarization; speaker-specific feature extraction; temporal resolution; unsupervised method; Bayesian nonparametric inference; HDP-HMM; factor analysis; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2264673
  • Filename
    6518171