Title :
Multi-document summarization based on hierarchical topic model
Author :
Liu, Hongyan ; Li, Lei
Author_Institution :
Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
In this paper, we introduced an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentences compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a large amount of discrete data, but also can organize these topics into a hierarchy to achieve a deeper semantic analysis. At the same time we also use sentence compression technology to refine the summaries, making them more concise. We use TAC 2010 data sets as the experimental test corpus and ROUGE method to evaluate our summaries. The evaluations confirmed that our method has better performance than some traditional methods.
Keywords :
data compression; document handling; statistical analysis; ROUGE method; hierarchical latent Dirichlet allocation model; hierarchical topic model; multidocument summarization; semantic analysis; sentence compression technology; hierarchical Latent Dirichlet Allocationtopic model; multi-document summarization; sentence compression;
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
DOI :
10.1109/NLPKE.2011.6138174