DocumentCode
263405
Title
Correlation Aware Technique for SQL to NoSQL Transformation
Author
Jen-Chun Hsu ; Ching-Hsien Hsu ; Shih-Chang Chen ; Yeh-Ching Chung
Author_Institution
Dept. Comput. Sci. & Inf. Eng., Chung Hua Univ., Hsinchu, Taiwan
fYear
2014
fDate
12-14 July 2014
Firstpage
43
Lastpage
46
Abstract
For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.
Keywords
SQL; parallel processing; public domain software; Apache Hadoop; Apache Sqoop concept; CA_Sqoop; NoSQL transformation; SQL transformation; correlation aware technique; data locality; data nodes; data placement mechanism; data transformation cost reduction; database performance; distributed computing; general data analysis; parallel computing; query efficiency; Cloud computing; Computer architecture; Correlation; Data processing; Distributed databases; File systems; Big Data; Cloud computing; Data locality; NoSQL; Sqoop;
fLanguage
English
Publisher
ieee
Conference_Titel
Ubi-Media Computing and Workshops (UMEDIA), 2014 7th International Conference on
Conference_Location
Ulaanbaatar
Print_ISBN
978-1-4799-4267-1
Type
conf
DOI
10.1109/U-MEDIA.2014.27
Filename
6916323
Link To Document