Title :
Text Feature Extraction Based on Rough Set
Author :
Cheng, Yiyuan ; Zhang, Ruiling ; Wang, Xiufeng ; Chen, Qiushuang
Author_Institution :
Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin
Abstract :
In this paper, a method for text feature extraction based on rough set (TFERS) is proposed. Firstly, a new formulation for attribute significance is presented based on the classification capability of condition attributes, which avoids the recalculation of attribute significance during iterations of reduction procedure conducted in conventional rough-set-based methods. Secondly, the attribute correlation analysis is incorporated, which helps to achieve a satisfactory reduction of text features. In text preprocessing phase, the typical vector space representation is extended from term to concept (dasiasynsetpsila) level based on Wordnet. In this way, the problem of synonym is solved and the dimension of the feature vector is reduced obviously. The simulation experiment and applications in text classification show that TFERS can improve the classification performance significantly.
Keywords :
classification; correlation methods; data reduction; feature extraction; iterative methods; rough set theory; text analysis; Wordnet; attribute correlation analysis; iteration method; reduction procedure; rough set theory; text feature extraction; text feature reduction; text preprocessing phase; vector space representation; Analytical models; Computational modeling; Educational institutions; Feature extraction; Fuzzy systems; Internet; Set theory; Text categorization; Text mining; Web pages; Text feature extraction; attribute significance.; reduction; rough set;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.521