• DocumentCode
    2770573
  • Title

    Multiple feature combination to improve speaker diarization of telephone conversations

  • Author

    Gupta, Vishwa ; Kenny, Patrick ; Ouellet, Pierre ; Boulianne, Gilles ; Dumouchel, Pierre

  • Author_Institution
    Centre de recherche informatique de Montreal, Montreal
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    705
  • Lastpage
    710
  • Abstract
    We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification methods (SID) and Viterbi re-segmentation using Gaussian mixture models (GMMs). The Viterbi re-segmentation using GMMs is new, and it reduces the diarization error rate (DER) by 10%. We repeat these multistage segmentation and clustering steps twice: once with MFCCs as feature parameters for the GMMs used in gender labeling, SID and Viterbi re-segmentation steps, and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the DER. On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set, and 9.0% on the test set.
  • Keywords
    Bayes methods; Gaussian processes; error statistics; feature extraction; gender issues; iterative methods; pattern clustering; speaker recognition; Bayesian information criterion; Gaussian mixture models; acoustic change point detection algorithm; agglomerative clustering; broadcast news; diarization error rate; gender labeling; iterative Viterbi re-segmentation; multiple feature combination; multistage segmentation-clustering system; speaker diarization process; state-of-the-art speaker identification methods; telephone conversations; Broadcasting; Density estimation robust algorithm; Detection algorithms; Error analysis; Iterative methods; Labeling; Loudspeakers; Telephony; Testing; Viterbi algorithm; BIC clustering; SID clustering; speaker diarization; speaker segmentation and clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430198
  • Filename
    4430198