Title :
A Novel Chinese Text Summarization Approach Using Sentence Extraction Based on Kernel Words Recognition
Author :
Yang, Weijie ; Dai, Ruwei ; Cui, Xia
Author_Institution :
Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing
Abstract :
The continuing growth of world wide Web and on-line text collections makes a large volume of information available to users. Automatic text summarization helps users to quickly understand the documents. This paper proposes an automated technique for Chinese document summarization based on kernel words recognition and discourse segment extraction. This method can be divided into the following five steps. First, the input articles are annotated by lexical analysis. Second, all focused named entities are recognized using a machine learning method. Third, the input articles are divided into several discourse segments, all kernel words of these segments are extracted by the way of rule-based main verbs recognition, and all relations among entities are extracted. Fourth, all important sentence candidates are ranked based on some rules, and redundant sentences are removed based on kernel words information. Finally, several most important sentences are extracted to compose the summarization according to expected compression ratio, and these important sentences are output using a special document as reference. A series of experiments are performed on two Chinese document collections. The results show the superiority of the proposed technique over reference systems.
Keywords :
Internet; feature extraction; learning (artificial intelligence); text analysis; Chinese document summarization; Chinese text summarization; discourse segment extraction; kernel words recognition; lexical analysis; machine learning method; on-line text collections; rule-based main verbs recognition; sentence extraction; world wide Web; Automation; Data mining; Focusing; Fuzzy systems; Intelligent systems; Kernel; Laboratories; Learning systems; Text recognition; Web sites; Social network; Text Summarization; focused named entities; main verb;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Jinan Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.20