• DocumentCode
    3076562
  • Title

    A Study on Automatic Web Pages Categorization

  • Author

    Bo, Sun ; Qiurui, Sun ; Zhong, Chen ; Zengmei, Fu

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Beijing Normal Univ., Beijing
  • fYear
    2009
  • fDate
    6-7 March 2009
  • Firstpage
    1423
  • Lastpage
    1427
  • Abstract
    Since the Internet has become a huge repository of information, many studies address the issue of web pages categorization. For web page classification, we want to find a subset of words which help to discriminate between different kinds of web pages, so we introduced feature selection. In this paper, we study some feature selection methods such as ReliefF and Symmetrical Uncertainty. Also, the high dimensional text vocabulary space is one of the main challenges of web pages, we used Hidden Naive Bayes, Complement class Naive Bayes and other traditional techniques for web page classification. Results on benchmark dataset show that the abilities of HNB perform more satisfying than other methods and SU is more competitive than ReliefF for relevant words selection in web pages categorization.
  • Keywords
    Bayes methods; Internet; Web sites; pattern classification; text analysis; Internet; ReliefF; Web page classification; automatic Web pages categorization; complement class naive Bayes method; feature selection; hidden naive Bayes method; symmetrical uncertainty; text vocabulary space; words selection; Data mining; Educational institutions; Entropy; Equations; Internet; Nearest neighbor searches; Sun; Uncertainty; Web mining; Web pages; ReliefF; Symmetrical Uncertainty; Web pages categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference, 2009. IACC 2009. IEEE International
  • Conference_Location
    Patiala
  • Print_ISBN
    978-1-4244-2927-1
  • Electronic_ISBN
    978-1-4244-2928-8
  • Type

    conf

  • DOI
    10.1109/IADCC.2009.4809225
  • Filename
    4809225