DocumentCode :
1796735
Title :
A Software Architecture for Progressive Scanning of On-line Communities
Author :
Baldoni, Roberto ; DAmore, Fabrizio ; Mecella, Massimo ; Ucci, Daniele
Author_Institution :
Dipt. di Ing. Inf. Autom. e Gestionale, Cyber-Intell. & Inf. Security Center, Sapienza Univ. di Roma, Rome, Italy
fYear :
2014
fDate :
June 30 2014-July 3 2014
Firstpage :
207
Lastpage :
212
Abstract :
We consider a set of on-line communities (e.g., news, blogs, Google groups, Web sites, etc.). The content of a community is continuously updated by users and such updates can be seen by users of other communities. Thus, when creating an update, a user could be influenced by one or more updates creating a semantic causal relationship among updates. This transitively will allow to trace how an information flows across communities. The paper presents a software architecture that progressively scan a set of on-line communities in order to detect such semantic causal relationships. The architecture includes a crawler, a large scale storage, a distributed indexing system and a mining system. The paper mainly focuses on crawling and indexing.
Keywords :
social networking (online); software architecture; Google groups; Web sites; blogs; crawler; distributed indexing system; information flows; large scale storage; mining system; news; online communities; progressive scanning; semantic causal relationship; software architecture; update; Communities; Computer architecture; Crawlers; Data mining; Indexing; Software architecture; MapR; Nutch; On-line communities; progressive scanning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems Workshops (ICDCSW), 2014 IEEE 34th International Conference on
Conference_Location :
Madrid
ISSN :
1545-0678
Print_ISBN :
978-1-4799-4182-7
Type :
conf
DOI :
10.1109/ICDCSW.2014.37
Filename :
6888863
Link To Document :
بازگشت