Title :
Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization
Author :
Arora, Rachit ; Ravindran, Balaraman
Author_Institution :
Comput. Sci. & Eng., Indian Inst. of Technol., Chennai
Abstract :
Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.
Keywords :
abstracting; document handling; singular value decomposition; vectors; DUC2002 Corpus multidocument summarization tasks; LDA mixture model weighted term domain; ROUGE evaluator; latent Dirichlet allocation; orthogonal representations; orthogonal vectors; singular value decomposition; Bayesian methods; Computer science; Context modeling; Data engineering; Data mining; Frequency; Joining processes; Linear discriminant analysis; Probability distribution; Singular value decomposition; Multi-Document Summarization; Natural Language Processing;
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3502-9
DOI :
10.1109/ICDM.2008.55