• DocumentCode
    417220
  • Title

    Clustering and segmenting speakers and their locations in meetings

  • Author

    Ajmera, Jitendra ; Lathoud, Guillaume ; McCowan, Iain

  • Author_Institution
    Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    The paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information: magnitude spectrum analysis and sound source localization. We combine the two in an appropriate HMM framework. There are three main advantages of this approach. First, it is completely unsupervised, i.e. speaker identities and number of speakers and locations are automatically inferred. Second, it is threshold-free, i.e. the decisions are made without the need of a threshold value which generally requires an additional development dataset. The third advantage is that the joint segmentation improves over the speaker segmentation derived using only acoustic features. Experiments on a series of meetings recorded in the IDIAP smart meeting room demonstrate the effectiveness of this approach.
  • Keywords
    audio signal processing; hidden Markov models; source separation; speaker recognition; spectral analysis; HMM; acoustic features; audio recordings; automatic meeting annotation; magnitude spectrum analysis; smart meeting room; sound source localization; speaker clustering; speaker identification; speaker location; speaker segmentation; Artificial intelligence; Audio recording; Feature extraction; Hidden Markov models; Information analysis; Information resources; Loudspeakers; Mel frequency cepstral coefficient; Microphone arrays; Speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326058
  • Filename
    1326058