DocumentCode
2539330
Title
Research on Text Clustering Based on Concept Weight
Author
Li, Yuqin ; Lv, Xueqiang ; Liu, Yufang ; Shi, Shuicai
Author_Institution
Chinese Inf. Process. Res. Center, Beijing Inf. Sci. & Technol. Univ., Beijing, China
fYear
2010
fDate
13-15 Dec. 2010
Firstpage
232
Lastpage
235
Abstract
Through research on the calculation method of feature words´ weight in texts and semantic similarity between words, we proposed a calculation method of feature words´ weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension of the text vector, and then makes the text vector space model better and improves the quality of text clustering. Experimental results show the feasibility of the method, and prove that concept-weight-based text clustering increased by 22 percentage points or so than non-concept-weight-based in the final evaluation of the FI index value.
Keywords
feature extraction; pattern clustering; set theory; text analysis; word processing; concept weight-based text clustering; feature set; feature word; semantic association phenomenon; text vector space model; Data mining; Data models; Electronic mail; Feature extraction; Information processing; Information science; Semantics; Concept Document Frequency; Concept Frequency; Concept Weight; Text Clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-4244-8891-9
Electronic_ISBN
978-0-7695-4281-2
Type
conf
DOI
10.1109/ICGEC.2010.64
Filename
5715412
Link To Document