DocumentCode :
249314
Title :
A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load
Author :
Liping Zhang ; Qi Chen ; Kai Miao
Author_Institution :
Intel Corp., Shanghai, China
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
80
Lastpage :
87
Abstract :
This paper presents several efficient ways to improve data loading and storage optimization in Hadoop cluster. We design a new method to leverage LZMA and ORC to gain performance edge, also improve ORC implementation in HDFS to have a higher compression ratio and better IO throughput. A complete optimization strategy for efficient big data loading, including byte array-oriented, record split, less serialization and shuffle, reducing middle data landing to earn great performance boost is presented. This paper provides preliminary results and analytics. Evaluation results indicate that our method achieves significant performance improvement for big data load.
Keywords :
Big Data; optimisation; HDFS; Hadoop cluster; IO throughput; compatible LZMA ORC-based optimization; complete optimization strategy; high performance big data load; middle data landing; Arrays; Big data; Dictionaries; Loading; Optimization; Sorting; Throughput; HBase; Hadoop; I/O compression; LZMA; ORC; bulk load; performance optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.21
Filename :
6906764
Link To Document :
بازگشت