DocumentCode
2659959
Title
Using hidden Markov models for topic segmentation of meeting transcripts
Author
Sherman, Melissa ; Liu, Yang
Author_Institution
Behavioral & Brain Sci., Univ. of Texas at Dallas, Dallas, TX
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
185
Lastpage
188
Abstract
In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.
Keywords
hidden Markov models; information analysis; unsupervised learning; Pk metrics; hidden Markov model; language model order; lexical chain; stop words; text segment clustering; topic boundary information; topic segmentation performance; unsupervised learning; Broadcasting; Coherence; Computer science; Decision trees; Degradation; Feature extraction; Hidden Markov models; Machine learning algorithms; Speech analysis; Unsupervised learning; Hidden Markov Model; LCSeg; Meeting Transcript; Topic Segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location
Goa
Print_ISBN
978-1-4244-3471-8
Electronic_ISBN
978-1-4244-3472-5
Type
conf
DOI
10.1109/SLT.2008.4777871
Filename
4777871
Link To Document