• DocumentCode
    454698
  • Title

    Fast and Robust Speaker Clustering Using the Earth Mover´S Distance and Mixmax Models

  • Author

    Stadelmann, Thilo ; Freisleben, Bernd

  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    Speaker clustering is the task of assigning a unique label to all speech segments in a video uttered by the same speaker. There are two key challenges: processing speed and robustness in the presence of noise. In this paper, we present an approach to significantly improve the processing speed of a hierarchical speaker clustering algorithm by using the earth mover´s distance (EMD) as the distance measure. By extending the well-known MIXMAX speaker model such that the EMD can be applied, noise robustness is achieved. Experimental results show that the runtime of the proposed EMD approach decreases by more than a factor of 120 compared to a likelihood ratio based distance measure while the clustering performance remains nearly the same
  • Keywords
    Gaussian processes; pattern clustering; speech processing; Gaussian mixture model; MIXMAX speaker model; earth movers distance; hierarchical speaker clustering algorithm; noise robustness; speech segments; Automatic speech recognition; Clustering algorithms; Computer science; Earth; Hidden Markov models; Mathematical model; Mathematics; Noise cancellation; Noise robustness; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660189
  • Filename
    1660189