• Title of article

    Text segmentation: A topic modeling perspective

  • Author/Authors

    Hemant Misra، نويسنده , , François Yvon، نويسنده , , Olivier Cappé، نويسنده , , Joemon Jose، نويسنده ,

  • Issue Information
    دوماهنامه با شماره پیاپی سال 2011
  • Pages
    17
  • From page
    528
  • To page
    544
  • Abstract
    In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only; it is applicable to all the algorithms that use DP for the task of text segmentation.
  • Keywords
    Text segmentation , Dynamic programming , Latent Dirichlet Allocation , Semantic Information , Topic modeling
  • Journal title
    Information Processing and Management
  • Serial Year
    2011
  • Journal title
    Information Processing and Management
  • Record number

    1229133