DocumentCode :
1759329
Title :
Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization
Author :
Xiaoyan Cai ; Wenjie Li
Author_Institution :
Coll. of Inf. Eng., Northwest A&F Univ., Xi´an, China
Volume :
21
Issue :
7
fYear :
2013
fDate :
41456
Firstpage :
1424
Lastpage :
1433
Abstract :
Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a number of topic themes with each theme represented by a cluster of highly related sentences, sentence clustering has been explored in the literature in order to provide more informative summaries. For each topic theme, the rank of terms conditional on this topic theme should be very distinct, and quite different from the rank of terms in other topic themes. Existing cluster-based summarization approaches apply clustering and ranking in isolation, which leads to incomplete, or sometimes rather biased, analytical results. A newly emerged framework uses sentence clustering results to improve or refine the sentence ranking results. Under this framework, we propose a novel approach that directly generates clusters integrated with ranking in this paper. The basic idea of the approach is that ranking distribution of sentences in each cluster should be quite different from each other, which may serve as features of clusters and new clustering measures of sentences can be calculated accordingly. Meanwhile, better clustering results can achieve better ranking results. As a result, ranking and clustering by mutually and simultaneously updating each other so that the performance of both can be improved. The effectiveness of the proposed approach is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.
Keywords :
document handling; pattern clustering; statistical analysis; DUC 2004-2007 datasets; cluster quality analysis; cluster-based summarization approach; informative summaries; multidocument summarization; sentence clustering; sentence ranking distribution; Clustering algorithms; Equations; Estimation; Hidden Markov models; Mathematical model; Semantics; Vectors; Document summarization; sentence clustering; sentence ranking;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2253098
Filename :
6480794
Link To Document :
بازگشت