DocumentCode :
167656
Title :
Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy
Author :
Liang Li ; Dixin Tang ; Taoying Liu ; Hong Liu ; Wei Li ; Chenzhou Cui
Author_Institution :
Inst. of Comput. Technol., Beijing, China
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1735
Lastpage :
1745
Abstract :
Cross-matching in astronomy is a basic procedure for comprehensibly analyzing the relations among different celestial objects. The aim is to search celestial objects in different catalogs and to determine if they are the same object. Basically, cross-matching can be expressed as a join query statement. Since celestial catalogs usually contain billion of stars, the join operator must be carefully designed and optimized for efficiency. In this paper, we focus on fulfilling cross-matching by MapReduce based join operators. The challenge is how to optimize the join operators to satisfy specific requirements of cross-matching. Therefore, we propose an optimized method and investigate its efficiency by theoretical analysis and experiment. Our study shows that the method has a remarkable improvement to previous work, especially when the data is very large.
Keywords :
astronomy computing; optimisation; query processing; string matching; MapReduce; astronomy cross-matching; celestial object relations; join operation optimization; join query statement; Conferences; Distributed processing; Astronomy; Cross-Matching; Join; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.193
Filename :
6969584
Link To Document :
بازگشت