DocumentCode :
2328346
Title :
Multi-document summarization based on lexical chains
Author :
Chen, Yan-Min ; Wang, Xiao-long ; Liu, Bing-quan
Author_Institution :
Dept. of Comput. Sci. & Eng., Harbin Inst. of Technol., China
Volume :
3
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
1937
Abstract :
This paper for the first time investigates using lexical chains as a model of multiple documents written in Chinese to generate an indicative, moderately fluent summary. The algorithm which computes lexical chains based on the HowNet knowledge database is modified to improve the performance and suit Chinese summarization. Based on an analysis of semanteme, the algorithm can remove redundant similarities and remain differences in information content among multiple documents. The method pre-processes the text first, then constructs lexical chains and identifies strong chains. Then significant sentences are extracted from each document and are ordered, and redundant information are recognized and removed. Finally, the summary is generated in chronological order, and the anaphora resolution technology is applied to improve the fluency of the summary. Evaluation results show that the performance of the presented system is obviously better than that of the baseline system, and lexical chains are effective for multidocument summarization.
Keywords :
data mining; natural languages; text analysis; Chinese language; Chinese summarization; HowNet knowledge database; anaphora resolution; information content; lexical chains; multidocument summarization; redundant information recognition; semanteme analysis; sentence extraction; sentence ordering; text processing; Algorithm design and analysis; Computer science; Cybernetics; Data mining; Databases; Electronic mail; Information analysis; Information retrieval; Machine learning; HowNet; Multi-document summarization; cohesion; lexical chains; semanteme;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527262
Filename :
1527262
Link To Document :
بازگشت