• DocumentCode
    3291774
  • Title

    Design of a Distributed Spiders System Based on Web Service

  • Author

    Guangli, Li ; Hongbin, Zhang

  • Author_Institution
    East China Jiaotong Univ., China
  • fYear
    2009
  • fDate
    6-7 June 2009
  • Firstpage
    167
  • Lastpage
    170
  • Abstract
    A distributed spiders antitype was designed by Web service based on service-oriented architecture (SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new Web page by the server according to the crawled pages. Moreover, they must manage the to crawl , crawled URL queues and noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue (MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.
  • Keywords
    Web services; queueing theory; software architecture; Microsoft Message Queue; Web service; crawled URL queues; distributed spiders system; noise URL queue; service-oriented architecture; Application software; Internet; Monitoring; Physics computing; Queueing analysis; Service oriented architecture; Uniform resource locators; Web pages; Web server; Web services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Mining and Web-based Application, 2009. WMWA '09. Second Pacific-Asia Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3646-0
  • Type

    conf

  • DOI
    10.1109/WMWA.2009.15
  • Filename
    5232493