• DocumentCode
    2659959
  • Title

    Using hidden Markov models for topic segmentation of meeting transcripts

  • Author

    Sherman, Melissa ; Liu, Yang

  • Author_Institution
    Behavioral & Brain Sci., Univ. of Texas at Dallas, Dallas, TX
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    185
  • Lastpage
    188
  • Abstract
    In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.
  • Keywords
    hidden Markov models; information analysis; unsupervised learning; Pk metrics; hidden Markov model; language model order; lexical chain; stop words; text segment clustering; topic boundary information; topic segmentation performance; unsupervised learning; Broadcasting; Coherence; Computer science; Decision trees; Degradation; Feature extraction; Hidden Markov models; Machine learning algorithms; Speech analysis; Unsupervised learning; Hidden Markov Model; LCSeg; Meeting Transcript; Topic Segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
  • Conference_Location
    Goa
  • Print_ISBN
    978-1-4244-3471-8
  • Electronic_ISBN
    978-1-4244-3472-5
  • Type

    conf

  • DOI
    10.1109/SLT.2008.4777871
  • Filename
    4777871