• DocumentCode
    1474865
  • Title

    An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

  • Author

    Vijayasenan, Deepu ; Valente, Fabio ; Bourlard, Hervé

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • Volume
    19
  • Issue
    2
  • fYear
    2011
  • Firstpage
    431
  • Lastpage
    438
  • Abstract
    This correspondence describes a novel system for speaker diarization of meetings recordings based on the combination of acoustic features (MFCC) and time delay of arrivals (TDOAS). The first part of the paper analyzes differences between MFCC and TDOA features which possess completely different statistical properties. When Gaussian mixture models are used, experiments reveal that the diarization system is sensitive to the different recording scenarios (i.e., meeting rooms with varying number of microphones). In the second part, a new multistream diarization system is proposed extending previous work on information theoretic diarization. Both speaker clustering and speaker realignment steps are discussed; in contrary to current systems, the proposed method avoids to perform the feature combination averaging log-likelihood scores. Experiments on meetings data reveal that the proposed approach outperforms the GMM-based system when the recording is done with varying number of microphones.
  • Keywords
    speaker recognition; statistical analysis; time-of-arrival estimation; MFCC feature; Mel frequency cepstral coefficients; acoustic features; feature combination averaging log likelihood score; meetings recordings; multistream diarization system; speaker clustering; speaker diarization; speaker realignment; statistical property; time delay of arrival; Change detection algorithms; Delay effects; Geometry; Hidden Markov models; Iron; Loudspeakers; Mel frequency cepstral coefficient; Microphone arrays; Speech; Unsupervised learning; Feature combination; information bottleneck; meeting data; speaker diarization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2048603
  • Filename
    5451107