• DocumentCode
    19520
  • Title

    Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations

  • Author

    Yella, Sree Harsha ; Bourlard, Herve

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • Volume
    22
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    1688
  • Lastpage
    1700
  • Abstract
    Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to occur at specific parts of a conversation. They have also shown that overlap occurrence is correlated with various conversational features such as speech, silence patterns and speaker turn changes. We use features capturing this higher level information from structure of a conversation such as silence and speaker change statistics to improve acoustic feature based classifier of overlapping and single-speaker speech classes. The silence and speaker change statistics are computed over a long-term window (around 3-4 seconds) and are used to predict the probability of overlap in the window. These estimates are then incorporated into a acoustic feature based classifier as prior probabilities of the classes. Experiments conducted on three corpora (AMI, NIST-RT and ICSI) have shown that the proposed method improves the performance of acoustic feature-based overlap detector on all the corpora. They also reveal that the model based on long-term conversational features used to estimate probability of overlap which is learned from AMI corpus generalizes to meetings from other corpora (NIST-RT and ICSI). Moreover, experiments on ICSI corpus reveal that the proposed method also improves laughter overlap detection. Consequently, applying overlap handling techniques to speaker diarization using the detected overlap results in reduction of diarization error rate (DER) on all the three corpora.
  • Keywords
    speaker recognition; AMI; ICSI; NIST-RT; acoustic feature based classifier; diarization error rate; long-term conversational features; meeting room conversations; overlapping speech detection; single-speaker speech classes; speaker change statistics; speaker diarization; Acoustics; Feature extraction; Hidden Markov models; Microphones; Probability; Speech; Speech processing; Meeting room recordings; simultaneous speakers; speaker diarization; spontaneous conversations; spontaneous overlapping speech;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2346315
  • Filename
    6874499