DocumentCode :
1716445
Title :
Hybrid dimensionality reduction approach for web page classification
Author :
Sarode, Shraddha ; Gadge, Jayant
Author_Institution :
Comput. Eng. (M.E), Thadomal Shahani Eng. Coll., Mumbai, India
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Today there is huge amount of data available on World Wide Web. One way to manage data is web page classification. One of the issues of web page classification considered in this paper is high dimensionality. Dimensionality refers to number of terms in a web page. High dimensionality of web pages causes problem while classifying them. The main objective of reducing dimensionality of web pages is to improve the performance of the classifier. This paper describes hybrid approach of dimensionality reduction for web page classification using a rough set and information gain method. Feature selection and dimensionality reduction methods are used to reduce the dimensionality of web pages. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Web pages are classified using naïve Bayesian method. Significant results are obtained and tested for proposed architecture.
Keywords :
Bayes methods; Web sites; data reduction; feature selection; pattern classification; rough set theory; Web page classification; feature selection; hybrid dimensionality reduction approach; information gain method; naive Bayesian method; quick reduct algorithm; rough set; Accuracy; Approximation methods; Bayes methods; Classification algorithms; Computers; Web pages; Dimensionality Reduction; Feature Selection; Information gain; Naïve Bayes; Rough Set; Web Page Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication, Information & Computing Technology (ICCICT), 2015 International Conference on
Conference_Location :
Mumbai
Print_ISBN :
978-1-4799-5521-3
Type :
conf
DOI :
10.1109/ICCICT.2015.7045679
Filename :
7045679
Link To Document :
بازگشت