Title :
Splider: A split-based crawler of the BT-DHT network and its applications
Author :
Bingshuang Liu ; Shidong Wu ; Tao Wei ; Chao Zhang ; Jun Li ; Jianyu Zhang ; Yu Chen ; Chen Li
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
Capturing accurate snapshots of peer-to-peer (P2P) networks, especially those with millions of users, is essential to many P2P-based applications, including those monitoring and analyzing P2P networks. The large scale and dynamic nature of P2P networks, however, make this task very challenging. Existent crawlers of P2P networks, for example, often miss a substantial portion of the ID space while unnecessarily crawling numerous nodes repeatedly. In this paper, we design and evaluate a new crawler called Splider. Unlike traditional crawling algorithms that adopt an iterative approach, Splider recursively splits the ID space of P2P nodes to crawl even tiny corners of the ID space, while avoiding crawling repeated nodes. We further implement a Splider prototype for BT-DHT, a Kademlia-based distributed hash table (DHT) P2P network, that exploits the structure of routing tables at BT-DHT nodes. Experiments show that Splider is able to gather more than 16 million nodes with a 100% recall ratio, whereas a traditional iterative crawler can at best capture only about 8 million nodes with a 66% recall ratio while its traffic-cost effectiveness is 50% less than Splider. Splider can further support distributed deployment; without any synchronization overhead, it reduces the time of capturing a full snapshot to be only about 3 minutes. We finally report and analyze the captured BT-DHT snapshots, including the spatial and temporal distribution of BT-DHT nodes and the existence of sybil and eclipse attacks in BT-DHT.
Keywords :
computer network management; computer network security; peer-to-peer computing; BT-DHT network; ID space; Kademlia-based distributed hash table; P2P network; Splider; crawling algorithm; distributed deployment; eclipse attack; peer-to-peer network; snapshot capturing; spatial distribution; split-based crawler; sybil existence; temporal distribution; traffic-cost effectiveness; Accuracy; Bandwidth; Crawlers; Educational institutions; Heuristic algorithms; Peer-to-peer computing; Routing;
Conference_Titel :
Consumer Communications and Networking Conference (CCNC), 2014 IEEE 11th
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4799-2356-4
DOI :
10.1109/CCNC.2014.6866591