مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3576384

Title :

A data reusing strategy based on hive

Author :

Heng Xie ; Mei Wang ; Jiajin Le

Author_Institution :

Sch. of Comput. Sci. & Technol., DongHua Univ., Shanghai, China

fYear :

2014

Firstpage :

367

Lastpage :

373

Abstract :

Large scale data process has emerged as an important issue for concerned researchers. By reusing calculation results, the efficiency of large scale data process can be improved greatly. This paper proposes an efficient data reusing strategy based on the data warehouse tool-Hive, which works on MapReduce framework. Since the intermediate calculation results have been stored in DFS by different jobs in MapReduce workflow, the key issue is how to find the ruse information. This paper deals with this problem by two steps. In the proposed method, firstly, we define a joint object to organize and store the features of intermediate calculation results. Then, based on joint objects, this paper provides the algorithm to match and generate the reuse plan. This paper provides a way to obtain the best reuse strategy in case that there are more than one calculation result can be used. We conduct the experiments based on TPC-H and SSB benchmarks. The experimental results have demonstrated that our strategy can significantly improve the efficiency of large scale data process, and have little effect on queries executed at first time.

Keywords :

data handling; data warehouses; Hive; MapReduce framework; MapReduce workflow; SSB benchmarks; TPC-H benchmarks; data reusing strategy; data warehouse tool; joint objects; large scale data process; reuse plan; Amplitude modulation; Benchmark testing; Computational modeling; Data models; Educational institutions; Finite element analysis; Joints; Hive; MapReduce; calculation results reuse; join-object;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Science and Advanced Analytics (DSAA), 2014 International Conference on

Type :

conf

DOI :

10.1109/DSAA.2014.7058098

Filename :

7058098

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3576384