DocumentCode
3291774
Title
Design of a Distributed Spiders System Based on Web Service
Author
Guangli, Li ; Hongbin, Zhang
Author_Institution
East China Jiaotong Univ., China
fYear
2009
fDate
6-7 June 2009
Firstpage
167
Lastpage
170
Abstract
A distributed spiders antitype was designed by Web service based on service-oriented architecture (SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new Web page by the server according to the crawled pages. Moreover, they must manage the to crawl , crawled URL queues and noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue (MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.
Keywords
Web services; queueing theory; software architecture; Microsoft Message Queue; Web service; crawled URL queues; distributed spiders system; noise URL queue; service-oriented architecture; Application software; Internet; Monitoring; Physics computing; Queueing analysis; Service oriented architecture; Uniform resource locators; Web pages; Web server; Web services;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Mining and Web-based Application, 2009. WMWA '09. Second Pacific-Asia Conference on
Conference_Location
Wuhan
Print_ISBN
978-0-7695-3646-0
Type
conf
DOI
10.1109/WMWA.2009.15
Filename
5232493
Link To Document