DocumentCode :
2895419
Title :
Research on Multi-document Summarization Using Lexical Cohesion
Author :
Chen, Yanmin ; Lou, Xizhong ; Pan, Julong
Author_Institution :
Coll. of Inf. Eng., China Jiliang Univ., Hangzhou, China
fYear :
2009
fDate :
7-8 Nov. 2009
Firstpage :
118
Lastpage :
122
Abstract :
This paper investigates using lexical cohesion to generate a moderately fluent semantic summary from a collection of documents written in Chinese. Based on the algorithm of cohesion analysis using the relationship among the words in the HowNet knowledge database, the built system computes concept frequency rather than word frequency as a measurement of importance. It merges the analysis of lexical semantics and some summarization principles to remove the redundancy and remain the difference in multiple documents. Such approach reduces information loss due to vocabulary switching in the summarization process and the use of a more general notion of relatedness which is based on lexical semantics. Thus we can take into account some more-distant relationship between words. Evaluation results show that the performance of the presented system is obviously better than that of the baseline system. The system can be applied to on-line web texts processing.
Keywords :
natural language processing; text analysis; vocabulary; Chinese documents; HowNet knowledge database; Web text processing; cohesion analysis; concept frequency; lexical cohesion; lexical semantics; multidocument summarization; semantic summary; vocabulary switching; word relationship; Algorithm design and analysis; Databases; Educational institutions; Frequency measurement; Information analysis; Information systems; Mice; Text processing; Text recognition; Vocabulary; Hownet; Lexical Cohesion; Summarization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Mining, 2009. WISM 2009. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3817-4
Type :
conf
DOI :
10.1109/WISM.2009.32
Filename :
5368180
Link To Document :
بازگشت