DocumentCode :
710176
Title :
STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream
Author :
Wei Feng ; Chao Zhang ; Wei Zhang ; Jiawei Han ; Jianyong Wang ; Aggarwal, Charu ; Jianbin Huang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2015
fDate :
13-17 April 2015
Firstpage :
1561
Lastpage :
1572
Abstract :
What is happening around the world? When and where? Mining the geo-tagged Twitter stream makes it possible to answer the above questions in real-time. Although a single tweet can be short and noisy, proper aggregations of tweets can provide meaningful results. In this paper, we focus on hierarchical spatio-temporal hashtag clustering techniques. Our system has the following features: (1) Exploring events (hashtag clusters) with different space granularity. Users can zoom in and out on maps to find out what is happening in a particular area. (2) Exploring events with different time granularity. Users can choose to see what is happening today or in the past week. (3) Efficient single-pass algorithm for event identification, which provides human-readable hashtag clusters. (4) Efficient event ranking which aims to find burst events and localized events given a particular region and time frame. To support aggregation with different space and time granularity, we propose a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy. To achieve high scalability, we propose a divide-and-conquer method to construct the STREAMCUBE. To support flexible event ranking with different weights, we proposed a top-k based index. Different efficient methods are used to speed up event similarity computations. Finally, we have conducted extensive experiments on a real twitter data. Experimental results show that our framework can provide meaningful results with high scalability.
Keywords :
data mining; database management systems; divide and conquer methods; geography; pattern clustering; social networking (online); STREAMCUBE; Twitter stream; burst events; data cube structure; database community; divide-and-conquer method; event exploration; event identification; event ranking; event similarity computation; geo-tagged Twitter stream mining; hierarchical spatio-temporal hashtag clustering; human-readable hashtag clusters; localized events; single-pass algorithm; space granularity; spatial hierarchy; temporal hierarchy; time granularity; top-k based index; Clustering algorithms; Media; Noise measurement; Nominations and elections; Real-time systems; Semantics; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
Type :
conf
DOI :
10.1109/ICDE.2015.7113425
Filename :
7113425
Link To Document :
بازگشت