Title :
An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets
Author :
Fatih Akdag;Christoph F. Eick
Author_Institution :
Computer Science Department, University of Houston
Abstract :
We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert´s notion of interestingness captured by an interestingness function. This paper centers on finding interestingness hotspots on very large gridded datasets which are quite common in scientific computing. Mining large gridded datasets with a lot of variables and measurements requires a scalable framework that can process large amounts of data in an efficient way. In our recent work, we proposed a computational framework which discovers interestingness hotspots in gridded datasets using a 3-step approach which consists of seeding, hotspot growing and post-processing steps. In this paper, we significantly improve the efficiency of the framework by utilizing parallel processing and employing more efficient data structures and algorithms. We propose a novel heap-based hotspot growing algorithm which brings down the cost of hotspot growing phase significantly. In addition, we propose a graph-based preprocessing algorithm which decreases the number of hotspots grown by merging some hotspot seeds. Other improvements to the framework involve incremental calculation of interestingness functions, and growing hotspots in parallel. The improved framework is evaluated in a case study for a very large 4-dimensional gridded air pollution dataset in which we find interestingness hotspots with respect to pollutants.
Keywords :
"Algorithm design and analysis","Atmospheric measurements","Merging","Clustering algorithms","Complexity theory","Pollution measurement","Runtime"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363982