• DocumentCode
    2210136
  • Title

    A supervised discretization algorithm for web page classification

  • Author

    Mangai, J. Alamelu ; Kothari, Dipti S. ; Kumar, V. Santhosh

  • Author_Institution
    Dept. of Comput. Sci., BITS Pilani, Dubai, United Arab Emirates
  • fYear
    2012
  • fDate
    18-20 March 2012
  • Firstpage
    226
  • Lastpage
    231
  • Abstract
    The search engines provide huge number of web pages for each user query making it difficult to get the desired relevant result. This is due to the exponential increase in the size of the information repository, the WWW. In this paper we have implemented a supervised discretization algorithm which is used for classifying large scale data base like web pages using an inconsistency measure. This algorithm does not require apriori knowledge about the data base used and therefore identifies the number of bins automatically. Experiments are done on WebKB, a benchmarking data set for the machine learning community. The results have shown a good improvement in classification accuracy with discretized features than with continuous features.
  • Keywords
    Internet; learning (artificial intelligence); query processing; search engines; very large databases; WWW; Web page classification; WebKB; benchmarking data set; classifcation accuracy; inconsistency measure; information repository; large scale database classification; machine learning community; search engines; supervised discretization algorithm; user query; Accuracy; Classification algorithms; Machine learning; Machine learning algorithms; Niobium; Numerical models; Web pages; Discretization; Machine learning; Web Page Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information Technology (IIT), 2012 International Conference on
  • Conference_Location
    Abu Dhabi
  • Print_ISBN
    978-1-4673-1100-7
  • Type

    conf

  • DOI
    10.1109/INNOVATIONS.2012.6207737
  • Filename
    6207737