DocumentCode :
124147
Title :
Mining Twitter Data with Resource Constraints
Author :
Valkanas, George ; Katakis, Ioannis ; Gunopulos, Dimitrios ; Stefanidis, Antony
Author_Institution :
Univ. of Athens, Athens, Greece
Volume :
1
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
157
Lastpage :
164
Abstract :
Social media analysis constitutes a scientific field that is rapidly gaining ground due to its numerous research challenges and practical applications, as well as the unprecedented availability of data in real time. Several of these applications have significant social and economical impact, such as journalism, crisis management, advertising, etc. However, two issues regarding these applications have to be confronted. The first one is the financial cost. Despite the abundance of information, it typically comes at a premium price, and only a fraction is provided free of charge. For example, Twitter, a predominant social media online service, grants researchers and practitioners free access to only a small proportion (1%) of its publicly available stream. The second issue is the computational cost. Even when the full stream is available, off the shelf approaches are unable to operate in such settings due to the real-time computational demands. Consequently, real world applications as well as research efforts that exploit such information are limited to utilizing only a subset of the available data. In this paper, we are interested in evaluating the extent to which analytical processes are affected by the aforementioned limitation. In particular, we apply a plethora of analysis processes on two subsets of Twitter public data, obtained through the service´s sampling API´s. The first one is the default 1% sample, whereas the second is the Garden hose sample that our research group has access to, returning 10% of all public data. We extensively evaluate their relative performance in numerous scenarios.
Keywords :
application program interfaces; data mining; pricing; social networking (online); Garden hose sample; Twitter public data; computational cost; financial cost; mining Twitter data; premium price; resource constraints; service sampling API; social media analysis; social media online service; Correlation; Crisis management; Event detection; Media; Real-time systems; Sentiment analysis; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
Type :
conf
DOI :
10.1109/WI-IAT.2014.29
Filename :
6927538
Link To Document :
بازگشت