• DocumentCode
    402852
  • Title

    Studies on Chinese Web page classification

  • Author

    Shen, Dou ; Cong, Yan ; Sun, Jian-tao ; Lu, W-chang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    1
  • fYear
    2003
  • fDate
    2-5 Nov. 2003
  • Firstpage
    23
  • Abstract
    In this paper we make studies on several key aspects for Chinese Web page classification such as Web page representation, word segmentation and feature selection. For the first two aspects, we test the published techniques on these issues on our Chinese corpora and give reasonable analysis for their performance. As to feature selection, we bring forward the idea of taking the role of a word´s POS into consideration in pre-processing and the experimental results validate our idea.
  • Keywords
    Web sites; classification; Chinese Web page classification; Web page representation; data sets; feature selection; word segmentation; Computer science; Electronic mail; Explosives; Niobium; Performance analysis; Search engines; Sun; Testing; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2003 International Conference on
  • Print_ISBN
    0-7803-8131-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2003.1264435
  • Filename
    1264435