DocumentCode
424340
Title
Feature selection and text classification for Chinese Web documents
Author
Xu, Jian-Chao ; Liu, Da-you ; Hu, Ming
Author_Institution
Sch. of Comput. Sci. & Eng., Changchun Univ. of Technol., China
Volume
2
fYear
2004
fDate
26-29 Aug. 2004
Firstpage
1304
Abstract
A great deal of methods for feature selection and text classification have been widely applied to English Web documents, while few studies have been done on Chinese Web documents. This paper gives a term weighting method based on inverse document frequency, HTML tags and length of Chinese phrase, reports our method to select Web text feature based on the messy genetic algorithm, provides an algorithm for Web text classification based on improvement on lattice machine approach. Our experiments show that these methods are valuable.
Keywords
Internet; data mining; feature extraction; genetic algorithms; hypermedia markup languages; text analysis; Chinese Web document; English Web document; HTML tag; Internet; Web text feature; feature selection; inverse document frequency; lattice machine approach; messy genetic algorithm; term weighting method; text classification; Computer science; Educational technology; Frequency; Genetic algorithms; HTML; Knowledge engineering; Laboratories; Lattices; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN
0-7803-8403-2
Type
conf
DOI
10.1109/ICMLC.2004.1382394
Filename
1382394
Link To Document