• DocumentCode
    2279306
  • Title

    Robust speaker clustering in eigenspace

  • Author

    Faltlhauser, R. ; Ruske, G.

  • Author_Institution
    Inst. for Human-Machine-Commun., Technische Univ. Munchen, Munich, Germany
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    57
  • Lastpage
    60
  • Abstract
    We propose a speaker clustering scheme working in ´eigenspace´. Speaker models are transformed to a low-dimensional subspace using ´eigenvoices´. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.
  • Keywords
    Bayes methods; Gaussian processes; eigenvalues and eigenfunctions; hidden Markov models; pattern clustering; speech recognition; Bayesian adaptation; Euclidean distance; Gaussian mixture models; HMM; MAP; a-priori information; eigenspace; eigenvoices; maximum likelihood linear regression; speaker adaptation; speaker clustering; speech recognition; Acoustic measurements; Bayesian methods; Decorrelation; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Robustness; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
  • Print_ISBN
    0-7803-7343-X
  • Type

    conf

  • DOI
    10.1109/ASRU.2001.1034588
  • Filename
    1034588