DocumentCode :
1665387
Title :
Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing
Author :
Jianting Zhang ; Simin You ; Gruenwald, Le
Author_Institution :
Dept. of Comput. Sci., City Coll. of New York, New York, NY, USA
fYear :
2015
Firstpage :
150
Lastpage :
157
Abstract :
Existing Big Data systems are mostly designed for relational data. They are either incapable or inefficient in processing large-scale semi-structured data efficiently due to the inherent limitations on data abstraction, indexing support and exposure to native parallel programming tools. In this study, we report our work in developing a lightweight distributed execution engine for spatial join query processing on large-scale geospatial data. By integrating data parallel designs for single computing nodes, our execution engine is able to automatically dispatch data partitions to distributed computing nodes for efficient local execution on multi-core CPUs and GPUs. The execution engine supports asynchronous data transfer over network, asynchronous disk I/O and asynchronous computing. It also directly accesses distributed file systems to support creating and using indices conveniently and efficiently. In addition to be lightweight by design, which has less than 1,000 Lines Of Code (LOC), experiments using a real world application have demonstrated significant efficiency improvement over our previous works on extending a leading in-memory Big Data system (Impala) for spatial join query processing.
Keywords :
Big Data; data structures; distributed databases; electronic data interchange; parallel programming; query processing; relational databases; GPU; Impala; LOC; Lines Of Code; asynchronous computing; asynchronous data transfer over network; asynchronous disk I/O; data abstraction; data parallel design; data partition; distributed computing nodes; distributed file system; in-memory Big Data system; indexing support; large-scale geospatial data; large-scale semistructured data; large-scale spatial join query processing; lightweight distributed execution engine; multicore CPU; parallel programming tool; relational data; single computing node; Distributed databases; Engines; Indexing; Instruction sets; Query processing; Sparks; Spatial databases; Distributed Computing; Execution Engine; Lightweight; Spatial Join;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
Type :
conf
DOI :
10.1109/BigDataCongress.2015.30
Filename :
7207214
Link To Document :
بازگشت