• DocumentCode
    1363969
  • Title

    Speaker Clustering Using Decision Tree-Based Phone Cluster Models With Multi-Space Probability Distributions

  • Author

    Shen, Han-Ping ; Yeh, Jui-Feng ; Wu, Chung-Hsien

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    19
  • Issue
    5
  • fYear
    2011
  • fDate
    7/1/2011 12:00:00 AM
  • Firstpage
    1289
  • Lastpage
    1300
  • Abstract
    This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.
  • Keywords
    decision trees; maximum likelihood estimation; pattern clustering; speech recognition; statistical distributions; DT-PCM; GMM; HMM; MLLR; MSD decision trees; acoustic characteristics; decision tree-based phone cluster models; maximum-likelihood linear regression; multispace probability distributions; speaker identification; speaker-adapted phone cluster models; universal phone cluster models; Adaptation model; Clustering algorithms; Context modeling; Decision trees; Hidden Markov models; Speech; Speech recognition; Decision tree (DT); multi-space probability distributions (MSDs); phone clustering; speaker clustering; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2090144
  • Filename
    5613154