• DocumentCode
    2308302
  • Title

    Load Balancing Using Consistent Hashing: A Real Challenge for Large Scale Distributed Web Crawlers

  • Author

    Nasri, Mitra ; Sharifi, Mohsen

  • Author_Institution
    Comput. Eng. Dept., Iran Univ. of Sci. & Technol., Tehran
  • fYear
    2009
  • fDate
    26-29 May 2009
  • Firstpage
    715
  • Lastpage
    720
  • Abstract
    Large scale search engines nowadays use distributed Web crawlers to collect Web pages because it is impractical for a single machine to download the entire Web. Load balancing of such crawlers is an important task because of limitations in memory/resources of each crawling machine. Existing distributed crawlers use simple URL hashing based on site names as their partitioning policy. This can be done in a distributed environment using consistent hashing to dynamically manage joining and leaving of crawling nodes. This method is formally claimed to be load balanced in cases that hashing method is uniform. Given that the Web structure abides by power law distribution according to existing statistics, we argue that it is not at all possible for a uniform random hash function based on site´s URL to be load balanced for case of large scale distributed Web crawlers. We show the truth of this claim by applying Web statistics to consistent hashing as it is used in one of famous Web crawlers. We also report some experimental results to demonstrate the effect of load balancing when we just rely on hash of host names.
  • Keywords
    Internet; cryptography; file organisation; search engines; URL hashing; Web pages; Web statistics; Web structure; consistent hashing; large scale distributed Web crawlers; large scale search engines; load balancing; site names; uniform random hash function; Application software; Computer networks; Crawlers; Internet; Large-scale systems; Load management; Search engines; Statistical distributions; Uniform resource locators; Web pages; Consistent Hash; Distributed Web Crawler; Large Scale Distributed Systems; Load Balancing; Power Law; Web Crawlers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications Workshops, 2009. WAINA '09. International Conference on
  • Conference_Location
    Bradford
  • Print_ISBN
    978-1-4244-3999-7
  • Electronic_ISBN
    978-0-7695-3639-2
  • Type

    conf

  • DOI
    10.1109/WAINA.2009.96
  • Filename
    5136733