DocumentCode
402852
Title
Studies on Chinese Web page classification
Author
Shen, Dou ; Cong, Yan ; Sun, Jian-tao ; Lu, W-chang
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume
1
fYear
2003
fDate
2-5 Nov. 2003
Firstpage
23
Abstract
In this paper we make studies on several key aspects for Chinese Web page classification such as Web page representation, word segmentation and feature selection. For the first two aspects, we test the published techniques on these issues on our Chinese corpora and give reasonable analysis for their performance. As to feature selection, we bring forward the idea of taking the role of a word´s POS into consideration in pre-processing and the experimental results validate our idea.
Keywords
Web sites; classification; Chinese Web page classification; Web page representation; data sets; feature selection; word segmentation; Computer science; Electronic mail; Explosives; Niobium; Performance analysis; Search engines; Sun; Testing; Web pages; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN
0-7803-8131-9
Type
conf
DOI
10.1109/ICMLC.2003.1264435
Filename
1264435
Link To Document