• DocumentCode
    456354
  • Title

    Novel Method for Improving Web Text Classifiers Performance Through Machine Learning

  • Author

    Moradi, Parham ; Abdollahzadeh, Ahmad ; Shiri, Mohammad Ibrahim

  • Author_Institution
    Dept. of Comput. Sci., Amir Kabir Univ. of Technol., Tehran
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    534
  • Lastpage
    539
  • Abstract
    Automatic text classification means assigning text documents to the categories automatically. Web documents are a kind of text documents but they differ in two ways. First, Web documents are structured documents. Second, Web documents have relationship with each other through hyperlinks. In this article we propose a novel method for Web text classification. Our proposed method enhances classifier performance in two steps. First, we try to use Web graph information to create a virtual page for target Web page and use it instead of target Web page. Then we learn classifiers with these virtual pages. Second, we use different classifier methods such as naive Bayes, decision tree, ripper rule learner and SVM and learn these classifiers with different virtual pages. Then we use meta classifier to get all classifier results then combine these results with voting methods. Our experiments show that meta classifier improves classifier performance
  • Keywords
    Web sites; classification; learning (artificial intelligence); text analysis; Web graph; Web mining; Web text classification; data mining; machine learning; Classification tree analysis; Computer science; Decision trees; Machine learning; Support vector machine classification; Support vector machines; Testing; Text categorization; Voting; Web pages; Data Mining; Machine Learning; Meta Classifier; Virtual Pages; Web Mining; Web Text Classification Web Documents;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technologies, 2006. ICTTA '06. 2nd
  • Conference_Location
    Damascus
  • Print_ISBN
    0-7803-9521-2
  • Type

    conf

  • DOI
    10.1109/ICTTA.2006.1684427
  • Filename
    1684427