DocumentCode
245330
Title
On Utilizing Stochastic Non-linear Fractional Bin Packing to Resolve Distributed Web Crawling
Author
Yazid, Anis ; Oommen, B. John ; Granmo, Ole-Christoffer ; Goodwin, Morten
Author_Institution
Dept. of Comput. Sci., Univ. Coll. of Oslo & Akershus, Oslo, Norway
fYear
2014
fDate
19-21 Dec. 2014
Firstpage
32
Lastpage
37
Abstract
This paper deals with the extremely pertinent problem of web crawling, which is far from trivial considering the magnitude and all-pervasive nature of the World-Wide Web. While numerous AI tools can be used to deal with this task, in this paper we map the problem onto the combinatorially-hard stochastic non-linear fractional knapsack problem, which, in turn, is then solved using Learning Automata (LA). Such LA-based solutions have been recently shown to outperform previous state-of-the-art approaches to resource allocation in Web monitoring. However, the ever growing deployment of distributed systems raises the need for solutions that cope with a distributed setting. In this paper, we present a novel scheme for solving the non-linear fractional bin packing problem. Furthermore, we demonstrate that our scheme has applications to Web crawling, i.e., Distributed resource allocation, and in particular, to distributed Web monitoring. Comprehensive experimental results demonstrate the superiority of our scheme when compared to other classical approaches.
Keywords
Internet; automata theory; bin packing; learning (artificial intelligence); stochastic processes; LA; Web crawling; World Wide Web; distributed resource allocation; learning automata; stochastic nonlinear fractional bin packing problem; stochastic nonlinear fractional knapsack problem; Automata; Crawlers; Educational institutions; Materials; Monitoring; Resource management; Web pages; Bin Packing; Distributed Web Monitoring; Learning Automata;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location
Chengdu
Print_ISBN
978-1-4799-7980-6
Type
conf
DOI
10.1109/CSE.2014.40
Filename
7023551
Link To Document