DocumentCode :
3575355
Title :
A Framework for the Efficient Collection of Big Data from Online Social Networks
Author :
Petrillo, Umberto Ferraro ; Consolo, Stefano
Author_Institution :
Dip. di Sci. Statistiche, Univ. di Roma “Sapienza”, Rome, Italy
fYear :
2014
Firstpage :
34
Lastpage :
41
Abstract :
In this paper, we present a universal framework for collecting publicly available information from Online Social Networks (OSNs). Our proposal is based on a three-levels distributed architecture. At a first level, one or more crawler parallel processes are in charge of identifying all the resources that need to be acquired from a target OSN. Once identified, these resources are requested to the intermediate-level. This level implements an abstraction layer that allows crawlers to query at the same time different OSNs using the same interface. Here, if the needed resource is available, it is returned immediately. Otherwise, a new request is prepared and shared toward a network of remote data collectors processes by means of a set of distributed data structures. The architecture is organized to allow a large number of data collectors to operate in parallel, so to make it possible to download big amount of data in a relatively short amount of time. In our paper, we also present the results of some experiments we conducted on the Twitter and the Flickr OSNs to validate our framework.
Keywords :
Big Data; data structures; distributed databases; parallel processing; query processing; social networking (online); Flickr OSN; Twitter OSN; abstraction layer; big data collection; crawler parallel processes; distributed data structures; online social networks; publicly available information; remote data collectors; three-level distributed architecture; Crawlers; Data collection; Data mining; Data models; Standards; Twitter; big data; data collection; distributed architecture; online social networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networking and Collaborative Systems (INCoS), 2014 International Conference on
Print_ISBN :
978-1-4799-6386-7
Type :
conf
DOI :
10.1109/INCoS.2014.102
Filename :
7057067
Link To Document :
بازگشت