DocumentCode
2210136
Title
A supervised discretization algorithm for web page classification
Author
Mangai, J. Alamelu ; Kothari, Dipti S. ; Kumar, V. Santhosh
Author_Institution
Dept. of Comput. Sci., BITS Pilani, Dubai, United Arab Emirates
fYear
2012
fDate
18-20 March 2012
Firstpage
226
Lastpage
231
Abstract
The search engines provide huge number of web pages for each user query making it difficult to get the desired relevant result. This is due to the exponential increase in the size of the information repository, the WWW. In this paper we have implemented a supervised discretization algorithm which is used for classifying large scale data base like web pages using an inconsistency measure. This algorithm does not require apriori knowledge about the data base used and therefore identifies the number of bins automatically. Experiments are done on WebKB, a benchmarking data set for the machine learning community. The results have shown a good improvement in classification accuracy with discretized features than with continuous features.
Keywords
Internet; learning (artificial intelligence); query processing; search engines; very large databases; WWW; Web page classification; WebKB; benchmarking data set; classifcation accuracy; inconsistency measure; information repository; large scale database classification; machine learning community; search engines; supervised discretization algorithm; user query; Accuracy; Classification algorithms; Machine learning; Machine learning algorithms; Niobium; Numerical models; Web pages; Discretization; Machine learning; Web Page Classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovations in Information Technology (IIT), 2012 International Conference on
Conference_Location
Abu Dhabi
Print_ISBN
978-1-4673-1100-7
Type
conf
DOI
10.1109/INNOVATIONS.2012.6207737
Filename
6207737
Link To Document