• DocumentCode
    424340
  • Title

    Feature selection and text classification for Chinese Web documents

  • Author

    Xu, Jian-Chao ; Liu, Da-you ; Hu, Ming

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Changchun Univ. of Technol., China
  • Volume
    2
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    1304
  • Abstract
    A great deal of methods for feature selection and text classification have been widely applied to English Web documents, while few studies have been done on Chinese Web documents. This paper gives a term weighting method based on inverse document frequency, HTML tags and length of Chinese phrase, reports our method to select Web text feature based on the messy genetic algorithm, provides an algorithm for Web text classification based on improvement on lattice machine approach. Our experiments show that these methods are valuable.
  • Keywords
    Internet; data mining; feature extraction; genetic algorithms; hypermedia markup languages; text analysis; Chinese Web document; English Web document; HTML tag; Internet; Web text feature; feature selection; inverse document frequency; lattice machine approach; messy genetic algorithm; term weighting method; text classification; Computer science; Educational technology; Frequency; Genetic algorithms; HTML; Knowledge engineering; Laboratories; Lattices; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1382394
  • Filename
    1382394