DocumentCode :
2313523
Title :
Centroid Based Summarization of Multiple Documents Implemented Using Timestamps
Author :
Nedunchelian, R.
Author_Institution :
Dept. of Comput. Sci. & Eng., Sri Venkateswara Coll. of Eng., Pennalur
fYear :
2008
fDate :
16-18 July 2008
Firstpage :
480
Lastpage :
485
Abstract :
We propose a multiple-document summarization system with user interaction. We introduce a system that would extract a summary from multiple documents based on the document cluster centroids, which is effectively the distribution of terms in the multiple documents in the cluster. This summarization technique is a cluster- based, extractive summarization method, where passages are first clustered based on similarity, prior to the selection of passages that form the extractive summary of the documents. The sentences are then issued a timestamp based on the order of their occurrence in the original document, thereby ensuring the chronological order of sentences. Passage clustering forms a main component in this system that aims to extract the most relevant sentences of the documents at the same time keeping the summary non-redundant. The implementation is based on the MEAD extraction algorithm and redundancy based algorithm. MEAD extraction algorithm uses three features to compute the salience of the sentence. They are centroid value, positional value and first-sentence overlap. Redundancy algorithm checks for overlapping words in sentences and issues a redundancy penalty. Timestamps are issued to sentences to maintain the chronological order of the sentences and hence a coherent and free- flowing summary can be generated.
Keywords :
pattern clustering; text analysis; Automatic text summarization; MEAD extraction algorithm; chronological sentence order; document cluster centroid; extractive summarization method; first-sentence overlap; multiple-document summarization system; passage clustering; redundancy algorithm; time stamp; user interaction; Clustering algorithms; Communications technology; Computer science; Data mining; Educational institutions; Information processing; Information retrieval; Internet; Natural languages; Search engines; Centroid and Timestamp; MEAD; Multi-document summariztion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on
Conference_Location :
Nagpur, Maharashtra
Print_ISBN :
978-0-7695-3267-7
Electronic_ISBN :
978-0-7695-3267-7
Type :
conf
DOI :
10.1109/ICETET.2008.122
Filename :
4579948
Link To Document :
بازگشت