Title :
Feature Reduction for Web Document Classification
Author :
Song, MuHee ; Kang, DongJin ; Lee, SangJo
Author_Institution :
Dept. of Comput. Eng., Kyungpook Nat. Univ., Daegu
Abstract :
This paper suggests an evolutionary Web page classification method that originated from the need to enhance today´s classification performance of Web pages. Words that are utilized in certain Web pages are used to characterize that specific Web page. However, treating every word as a possible feature in a Web page classification does not guarantee a better classification performance. In response to this demand, this paper introduces one of the statistical analysis methods known as the principal component analysis (PCA) in order to reduce a large-scaled feature vector down to a smaller scaled feature vector containing a few chief elements and presents a result of simulation experiments to verify the reduction of feature vector size and the improvements of Web page classification-ability. For the classification-ability experiment, Yahoo, com´s sports News Web page section was experimented under the Naive Bayesian classification algorithm. The results of this experiment verified that the suggested method of news Web page classification algorithm used in this paper was indeed providing satisfactory accuracy in Web page classification among the sports-news database
Keywords :
Web sites; data reduction; document handling; pattern classification; principal component analysis; Web document classification; evolutionary Web page classification; feature reduction; large-scaled feature vector; principal component analysis; statistical analysis; Analytical models; Bayesian methods; Classification algorithms; Information technology; Principal component analysis; Spatial databases; Statistical analysis; Terminology; Web pages;
Conference_Titel :
Computational Intelligence and Security, 2006 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
1-4244-0605-6
Electronic_ISBN :
1-4244-0605-6
DOI :
10.1109/ICCIAS.2006.294242