Title :
Distributed web crawling: A framework for crawling of micro-blog data
Author :
Jie Xia;Wanggen Wan;Renzhong Liu;Guodong Chen;Qing Feng
Author_Institution :
School of Communication and Information Engineering Shanghai, China
fDate :
7/1/2015 12:00:00 AM
Abstract :
These days´ social networks have attracted people to express and share their interests. We aim to monitor public opinions and other valuable discoveries by using the data collected from social network website Sina Weibo. This paper present a distributed web crawler framework called SWORM, which runs on the Raspberry Pi (cheap card-sized single-board computer) for fetching the micro-blog data and overwhelms the traditional web crawlers on efficiency, scale, scalability and cost. The framework can easily be extended according to the specific needs of the user with the help of some simple python scripts. This paper first propose a model for micro-blog network to confirm what and how our crawler will crawl from social website. Secondly it will introduce the implementation details of the whole distributed system and finally will present experimental results. We ran some crawlers within our framework on the Raspberry Pi and stored the obtained resources in Shared MongoDB which is a category of NoSQL. Experimental results demonstrated that the use of distributed framework can greatly improve the efficiency and accuracy for collecting data.
Conference_Titel :
Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
DOI :
10.1049/cp.2015.0255