DocumentCode
2020271
Title
Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features
Author
DAI, Liuling ; HU, Jinwu ; Liu, WanChun
Author_Institution
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing
Volume
1
fYear
2008
fDate
17-18 Oct. 2008
Firstpage
182
Lastpage
185
Abstract
Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.
Keywords
feature extraction; rough set theory; text analysis; CHI square feature selection; redundant features; rough set; rough set theory; text categorization; text mining; Competitive intelligence; Computational intelligence; Computer science; Information retrieval; Information technology; Laboratories; Machine learning algorithms; Partial response channels; Support vector machines; Text categorization; SVM; feature selection; rough set; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
Conference_Location
Wuhan
Print_ISBN
978-0-7695-3311-7
Type
conf
DOI
10.1109/ISCID.2008.178
Filename
4725586
Link To Document