DocumentCode
456354
Title
Novel Method for Improving Web Text Classifiers Performance Through Machine Learning
Author
Moradi, Parham ; Abdollahzadeh, Ahmad ; Shiri, Mohammad Ibrahim
Author_Institution
Dept. of Comput. Sci., Amir Kabir Univ. of Technol., Tehran
Volume
1
fYear
0
fDate
0-0 0
Firstpage
534
Lastpage
539
Abstract
Automatic text classification means assigning text documents to the categories automatically. Web documents are a kind of text documents but they differ in two ways. First, Web documents are structured documents. Second, Web documents have relationship with each other through hyperlinks. In this article we propose a novel method for Web text classification. Our proposed method enhances classifier performance in two steps. First, we try to use Web graph information to create a virtual page for target Web page and use it instead of target Web page. Then we learn classifiers with these virtual pages. Second, we use different classifier methods such as naive Bayes, decision tree, ripper rule learner and SVM and learn these classifiers with different virtual pages. Then we use meta classifier to get all classifier results then combine these results with voting methods. Our experiments show that meta classifier improves classifier performance
Keywords
Web sites; classification; learning (artificial intelligence); text analysis; Web graph; Web mining; Web text classification; data mining; machine learning; Classification tree analysis; Computer science; Decision trees; Machine learning; Support vector machine classification; Support vector machines; Testing; Text categorization; Voting; Web pages; Data Mining; Machine Learning; Meta Classifier; Virtual Pages; Web Mining; Web Text Classification Web Documents;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location
Damascus
Print_ISBN
0-7803-9521-2
Type
conf
DOI
10.1109/ICTTA.2006.1684427
Filename
1684427
Link To Document