• DocumentCode
    3315017
  • Title

    Subtopic-based Multi-documents Summarization

  • Author

    Gong, Shu ; Qu, Youli ; Tian, ShengFeng

  • Author_Institution
    Sch. of Comput. & Inf. Technol., Beijing Jiaotong Univ., Beijing, China
  • Volume
    2
  • fYear
    2010
  • fDate
    28-31 May 2010
  • Firstpage
    382
  • Lastpage
    386
  • Abstract
    Multi-documents summarization is an important research area of NLP. Most methods or techniques of multi-document summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collection´s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic´s documents collection with some other topics´ documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.
  • Keywords
    document handling; natural language processing; probability; security of data; natural language processing; probabilistic topic model; subtopic-based multidocuments summarization method; Data mining; Data preprocessing; Feature extraction; Frequency; Information processing; Information technology; Natural language processing; Statistics; Vocabulary; Web pages; multi-documents summarization; sentence representation; subtopic; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
  • Conference_Location
    Huangshan, Anhui
  • Print_ISBN
    978-1-4244-6812-6
  • Electronic_ISBN
    978-1-4244-6813-3
  • Type

    conf

  • DOI
    10.1109/CSO.2010.239
  • Filename
    5533141