Sentences clustering based automatic summarization

Author

Wang, Jian-hui ; Zhou, Shui-geng ; Hu, Yun-fa

Author_Institution

Dept. of Comput. & Information Technol., Fudan Univ., Shanghai, China

Volume

1

fYear

2003

fDate

2-5 Nov. 2003

Firstpage

57

Abstract

There are two ways by which the research on automatic summarization is carried out. One is based on statistics, and the other is based on message understanding. The former has nothing to do with domain, but its accuracy is lower. On the contrary, the latter depends on domain, but its accuracy is higher. In this paper, an algorithm, which summarizes a document by extracting subtopics from the sentences, is based on statistics and partially understanding message, in order to get better summarization and get rid of the dependence on domain. Besides, since it is difficult to determine the length of a summary manually, the algorithm also strives to obtain a better summary with proper length. To this end, a new module of mutual dependence is put forward too and applied to segmentation, which can select accuracy features for the summarizing algorithm. And then new rules are brought forward to evaluate sentences for the summarizing algorithm. Furthermore, a new task based algorithm to evaluating summarization is impersonally offered.

Keywords

pattern clustering; statistics; text analysis; automatic summarization; message understanding; mutual dependence; segmentation; sentences clustering; statistics; Clustering algorithms; Concrete; Dictionaries; Information technology; Statistics;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2003 International Conference on

Print_ISBN

0-7803-8131-9

Type

conf

DOI

10.1109/ICMLC.2003.1264442

Filename

1264442