Title :
Dart: A Geographic Information System on Hadoop
Author :
Hong Zhang ; Zhibo Sun ; Zixia Liu ; Chen Xu ; Liqiang Wang
Author_Institution :
Dept. of Comput. Sci., Univ. of Wyoming, Laramie, WY, USA
Abstract :
In the field of big data research, analytics on spatio-temporal data from social media is one of the fastest growing areas and poses a major challenge on research and application. An efficient and flexible computing and storage platform is needed for users to analyze spatio-temporal patterns in huge amount of social media data. This paper introduces a scalable and distributed geographic information system, called Dart, based on Hadoop and HBase. Dart provides a hybrid table schema to store spatial data in HBase so that the Reduce process can be omitted for operations like calculating the mean center and the median center. It employs reasonable pre-splitting and hash techniques to avoid data imbalance and hot region problems. It also supports massive spatial data analysis like K-Nearest Neighbors (KNN) and Geometric Median Distribution. In our experiments, we evaluate the performance of Dart by processing 160 GB Twitter data on an Amazon EC2 cluster. The experimental results show that Dart is very scalable and efficient.
Keywords :
Big Data; data analysis; geographic information systems; parallel programming; Amazon EC2 cluster; Big Data research; Dart; HBase; Hadoop; Twitter data; geometric median distribution; hash techniques; hybrid table schema; k-nearest neighbors; massive spatial data analysis; pre-splitting techniques; scalable distributed geographic information system; social media data; spatio-temporal data; Algorithm design and analysis; Computational modeling; Data analysis; Geographic information systems; Media; Spatial databases; Twitter; GIS; Hadoop; Hbase; KNN; Mean Center; Median Center; Social Network;
Conference_Titel :
Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on
Conference_Location :
New York City, NY
Print_ISBN :
978-1-4673-7286-2
DOI :
10.1109/CLOUD.2015.22