Title :
Feature selection and text classification for Chinese Web documents
Author :
Xu, Jian-Chao ; Liu, Da-you ; Hu, Ming
Author_Institution :
Sch. of Comput. Sci. & Eng., Changchun Univ. of Technol., China
Abstract :
A great deal of methods for feature selection and text classification have been widely applied to English Web documents, while few studies have been done on Chinese Web documents. This paper gives a term weighting method based on inverse document frequency, HTML tags and length of Chinese phrase, reports our method to select Web text feature based on the messy genetic algorithm, provides an algorithm for Web text classification based on improvement on lattice machine approach. Our experiments show that these methods are valuable.
Keywords :
Internet; data mining; feature extraction; genetic algorithms; hypermedia markup languages; text analysis; Chinese Web document; English Web document; HTML tag; Internet; Web text feature; feature selection; inverse document frequency; lattice machine approach; messy genetic algorithm; term weighting method; text classification; Computer science; Educational technology; Frequency; Genetic algorithms; HTML; Knowledge engineering; Laboratories; Lattices; Text categorization; Web pages;
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
DOI :
10.1109/ICMLC.2004.1382394