DocumentCode :
1921617
Title :
A study of Chinese text summarization using adaptive clustering of paragraphs
Author :
Hu, Po ; He, Tingting ; Ji, DongHong ; Wang, Meng
Author_Institution :
Dept. of Comput. Sci., Central China Normal Univ., Wuhan, China
fYear :
2004
fDate :
14-16 Sept. 2004
Firstpage :
1159
Lastpage :
1164
Abstract :
Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a clustering analysis algorithm, we can capture the number of different latent topic regions in a document adoptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.
Keywords :
abstracting; computational linguistics; linguistics; natural languages; ontologies (artificial intelligence); pattern clustering; text analysis; Chinese document; Chinese text summarization; HowNet ontology; K-means clustering algorithm; adaptive paragraph clustering; automatic summarization; clustering analysis algorithm; final summary; flexible topic distribution; free writing style; latent topic regions; linguistic units; maximum topic completeness; minimum redundancy; natural language processing; representation entropy; semantic-class-based vector representations; single-document summary; summarization redundancy evaluation; term-based vector space model; topic representative sentences; Algorithm design and analysis; Clustering algorithms; Computer science; Entropy; Helium; Internet; Natural language processing; Ontologies; Partitioning algorithms; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
Print_ISBN :
0-7695-2216-5
Type :
conf
DOI :
10.1109/CIT.2004.1357351
Filename :
1357351
Link To Document :
بازگشت