Title :
Segmentation of speech using speaker identification
Author :
Wilcox, Lynn ; Chen, Francine ; Kimber, Don ; Balasubramanian, Vijay
Author_Institution :
Xerox PARC, Palo Alto, CA, USA
Abstract :
This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data
Keywords :
Gaussian distribution; Gaussian processes; Viterbi decoding; hidden Markov models; speaker recognition; speech processing; Baum-Welch training; Gaussian distributions; Viterbi decoding; agglomerative clustering; conversational speech segmentation; distance measure; duration model; hidden Markov model network; initialization; interconnected speaker sub-networks; likelihood ratio; segmentation accuracy; speaker identification; speaker labeled data; speaker segmentation; Cepstral analysis; Gaussian distribution; Hidden Markov models; Indexing; Iterative algorithms; Iterative decoding; Speech; Statistical distributions; Streaming media; Viterbi algorithm;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-1775-0
DOI :
10.1109/ICASSP.1994.389330