Title :
A supervised discretization algorithm for web page classification
Author :
Mangai, J. Alamelu ; Kothari, Dipti S. ; Kumar, V. Santhosh
Author_Institution :
Dept. of Comput. Sci., BITS Pilani, Dubai, United Arab Emirates
Abstract :
The search engines provide huge number of web pages for each user query making it difficult to get the desired relevant result. This is due to the exponential increase in the size of the information repository, the WWW. In this paper we have implemented a supervised discretization algorithm which is used for classifying large scale data base like web pages using an inconsistency measure. This algorithm does not require apriori knowledge about the data base used and therefore identifies the number of bins automatically. Experiments are done on WebKB, a benchmarking data set for the machine learning community. The results have shown a good improvement in classification accuracy with discretized features than with continuous features.
Keywords :
Internet; learning (artificial intelligence); query processing; search engines; very large databases; WWW; Web page classification; WebKB; benchmarking data set; classifcation accuracy; inconsistency measure; information repository; large scale database classification; machine learning community; search engines; supervised discretization algorithm; user query; Accuracy; Classification algorithms; Machine learning; Machine learning algorithms; Niobium; Numerical models; Web pages; Discretization; Machine learning; Web Page Classification;
Conference_Titel :
Innovations in Information Technology (IIT), 2012 International Conference on
Conference_Location :
Abu Dhabi
Print_ISBN :
978-1-4673-1100-7
DOI :
10.1109/INNOVATIONS.2012.6207737