• DocumentCode
    256703
  • Title

    Research on Multi-document Summarization Based on LDA Topic Model

  • Author

    Jinqiang Bian ; Zengru Jiang ; Qian Chen

  • Author_Institution
    Sch. of Autom., Beijing Inst. of Technol., Beijing, China
  • Volume
    2
  • fYear
    2014
  • fDate
    26-27 Aug. 2014
  • Firstpage
    113
  • Lastpage
    116
  • Abstract
    Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic-importance of the corpus together to calculate the posterior probability of the sentence, and then, based on the posterior probability, it selects sentences to form a summary. Topic-distribution of each sentence represents the likelihood of sentence belonging to each topic and topic-importance represents the degree that the topics cover the significant portion of the corpus. The method highlights the latent topics and optimizes the summarization. Experiment results on the dataset DUC2006 show the advantage of the multi-document summarization algorithm proposed in the paper. ROUGE values are improved compared with those methods, such as LexRank, LDA-SIBS, LDA-PGS.
  • Keywords
    information retrieval; probability; text analysis; LDA topic model; ROUGE value; latent dirichlet allocation; latent topic; multidocument summarization; posterior probability; sentence-ranking mechanism; topic-distribution; topic-importance; Data mining; Information retrieval; Probability distribution; Resource management; Semantics; Vectors; LDA; Multi-document summarization; Topic Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4799-4956-4
  • Type

    conf

  • DOI
    10.1109/IHMSC.2014.130
  • Filename
    6911461