• DocumentCode
    526653
  • Title

    Associative web document classification based on word mixed weight

  • Author

    Li, Xingyi ; Lan, Jun ; Shi, Huaji

  • Author_Institution
    Dept. of Comput. Sci. & Telecommun. Eng., Jiangsu Univ., Zhenjiang, China
  • Volume
    3
  • fYear
    2010
  • fDate
    9-11 July 2010
  • Firstpage
    578
  • Lastpage
    581
  • Abstract
    There are two shortages when the method of classification based on association rules is applied to classify the web documents: one is that the method process the web document as a plain text, ignoring the HTML tags information of the web page; another is that either item of the association rules is only the word in the web page, without considering the weight of the word, or it quantifies the weight of the word frequency, ignoring the importance of the location of the word in the web document. Therefore, a new efficient method is proposed in the paper. It calculates the word´s mixed weight by the information of the HTML tags feature, and then mines the classification rules based on the mixed weight to classify the web pages. The result of experiment shows that the performance of this approach is better than the traditional associated classification methods.
  • Keywords
    Internet; Web sites; classification; data mining; document handling; HTML tags information; Web page classification rules; associated classification method; association rules; associative Web document classification; word frequency; word mixed weight; Artificial neural networks; HTML; Niobium; Variable speed drives; HTML tags; association rules; mixed weight; web document classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-5537-9
  • Type

    conf

  • DOI
    10.1109/ICCSIT.2010.5564804
  • Filename
    5564804