Title :
Classification of web pages on attractiveness: A supervised learning approach
Author :
Khade, G. ; Kumar, Sudhakar ; Bhattacharya, Surya
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Guwahati, Guwahati, India
Abstract :
Random surfers spend very little time on a web page. If the most important web page content fails to attract his attention within the short time span, he will move away to some other page, thus defeating the purpose of the web page designer. In order to predict if the contents of a web page will catch a random surfer´s attention or not, we propose a machine learning based approach to classify web pages into “bad” and “not bad” classes, where the “bad” class implies poor attention drawing ability. We propose to divide web page contents into “objects”, which are coherent regions of web page conveying the same information, to develop the classifier approach. We surveyed 100 web pages sampled from the Internet to identify the type and frequency of objects used in web page design. From our survey, we identified six types of objects that are most important in determining the class of a web page, in terms of its attention drawing capability. We used the WEKA tool to implement the machine learning approach. Two different strategies of percentage split and three different strategies of cross validation are used to check for accuracy of the classifier. We have experimented with 65 algorithms supported by WEKA and found that the algorithms RBF network and Random subspace, among the 65, gives the best performance, with about 83% accuracy.
Keywords :
Internet; Web design; learning (artificial intelligence); pattern classification; radial basis function networks; Internet; RBF network; WEKA; Web page classification; Web page design; bad class; classifier approach; cross validation strategy; machine learning based approach; not bad class; percentage split strategy; random subspace; random surfer; supervised learning approach; Accuracy; Algorithm design and analysis; Guidelines; Internet; Radial basis function networks; Testing; Web pages; Random surfer; attention; classifier; objects;
Conference_Titel :
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4673-4367-1
DOI :
10.1109/IHCI.2012.6481867