DocumentCode
2710106
Title
Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization
Author
Arora, Rachit ; Ravindran, Balaraman
Author_Institution
Comput. Sci. & Eng., Indian Inst. of Technol., Chennai
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
713
Lastpage
718
Abstract
Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.
Keywords
abstracting; document handling; singular value decomposition; vectors; DUC2002 Corpus multidocument summarization tasks; LDA mixture model weighted term domain; ROUGE evaluator; latent Dirichlet allocation; orthogonal representations; orthogonal vectors; singular value decomposition; Bayesian methods; Computer science; Context modeling; Data engineering; Data mining; Frequency; Joining processes; Linear discriminant analysis; Probability distribution; Singular value decomposition; Multi-Document Summarization; Natural Language Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location
Pisa
ISSN
1550-4786
Print_ISBN
978-0-7695-3502-9
Type
conf
DOI
10.1109/ICDM.2008.55
Filename
4781167
Link To Document