Title :
On Utilizing Stochastic Non-linear Fractional Bin Packing to Resolve Distributed Web Crawling
Author :
Yazid, Anis ; Oommen, B. John ; Granmo, Ole-Christoffer ; Goodwin, Morten
Author_Institution :
Dept. of Comput. Sci., Univ. Coll. of Oslo & Akershus, Oslo, Norway
Abstract :
This paper deals with the extremely pertinent problem of web crawling, which is far from trivial considering the magnitude and all-pervasive nature of the World-Wide Web. While numerous AI tools can be used to deal with this task, in this paper we map the problem onto the combinatorially-hard stochastic non-linear fractional knapsack problem, which, in turn, is then solved using Learning Automata (LA). Such LA-based solutions have been recently shown to outperform previous state-of-the-art approaches to resource allocation in Web monitoring. However, the ever growing deployment of distributed systems raises the need for solutions that cope with a distributed setting. In this paper, we present a novel scheme for solving the non-linear fractional bin packing problem. Furthermore, we demonstrate that our scheme has applications to Web crawling, i.e., Distributed resource allocation, and in particular, to distributed Web monitoring. Comprehensive experimental results demonstrate the superiority of our scheme when compared to other classical approaches.
Keywords :
Internet; automata theory; bin packing; learning (artificial intelligence); stochastic processes; LA; Web crawling; World Wide Web; distributed resource allocation; learning automata; stochastic nonlinear fractional bin packing problem; stochastic nonlinear fractional knapsack problem; Automata; Crawlers; Educational institutions; Materials; Monitoring; Resource management; Web pages; Bin Packing; Distributed Web Monitoring; Learning Automata;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.40