• DocumentCode
    2174169
  • Title

    Gibbs sampling based Multi-scale Mixture Model for speaker clustering

  • Author

    Watanabe, Shinji ; Mochihashi, Daichi ; Hori, Takaaki ; Nakamura, Atsushi

  • Author_Institution
    Commun. Sci. Labs., NTT Corp., Seika, Japan
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4524
  • Lastpage
    4527
  • Abstract
    The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mixture Model (M3). The proposed approach focuses on the multi-scale property of speech dynamics, i.e., dynamics in speech can be observed on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales. M3 is an extension of the Gaussian mixture model and is considered a hierarchical mixture model, where mixture components in each time scale will change at intervals of the corresponding time unit. We derive a fully Bayesian treatment of the multi-scale mixture model based on Gibbs sampling. The advantage of the proposed model is that each speaker cluster can be precisely modeled based on the Gaussian mixture model unlike conventional single-Gaussian based speaker clustering (e.g., using the Bayesian Information Criterion (BIC)). In addition, Gibbs sampling offers the potential to avoid a serious local optimum problem. Speaker clustering experiments confirmed these advantages and obtained a significant improvement over the conventional BIC based approaches.
  • Keywords
    Gaussian processes; pattern clustering; speaker recognition; BIC; Bayesian information criterion; Gaussian mixture model; Gibbs multiscale mixture model; Gibbs sampling; linguistic-segmental; multiscale mixture model; short-time acoustical; speaker clustering; speech dynamics modeling; utterance-wise temporal scales; Bayesian methods; Equations; Mathematical model; Fully Bayesian approach; Gaussian mixture; Gibbs sampling; multi-scale mixture model; speaker clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947360
  • Filename
    5947360