DocumentCode :
3042877
Title :
A Novel Scheme for Term Weighting in Text Categorization: Positive Impact Factor
Author :
Emmanuel, M. ; Khatri, Saurabh M. ; Babu, D. R. Ramesh
Author_Institution :
Dept. of Inf. Technol., Pune Inst. of Comput. Technol., Pune, India
fYear :
2013
fDate :
13-16 Oct. 2013
Firstpage :
2292
Lastpage :
2297
Abstract :
The exploitation of Data Mining and Knowledge discovery has penetrated to a variety of Machine Learning Systems. A very important area in the field of Machine learning is Text Categorization. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus our research on effective term weighting and propose a novel Term weighting approach i.e. Positive Impact Factor (PIF). PIF is a supervised variation of traditional term weighting models. The idea behind PIF scheme revolves around the assumption "Positive impact of a feature to a category can be used to calculate its negative impact for other categories." To examine our weighting scheme we used the dataset Classic 3 from Cornell, which has documents in 3 predefined categories. Results of our experiment and comparison with existing methods such as Binary, TF, TF-IDF, TF-RF, TF-CHI2 etc show remarkable improvement in accuracy with a significant reduction of computational cost.
Keywords :
data mining; learning (artificial intelligence); text analysis; PIF; data mining; feature selection; knowledge discovery; machine learning system; positive impact factor; term weighting; text categorization; Accuracy; Machine learning algorithms; Radio frequency; Support vector machines; Testing; Text categorization; Training; Positive Impact factor; Text Categorization; Vector Space model; Weighting Scheme;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
Type :
conf
DOI :
10.1109/SMC.2013.392
Filename :
6722145
Link To Document :
بازگشت