Title :
Parallelizing R in Hadoop (A Work-in-Progress Study)
Author :
Yen-Zhou Huang;Yu-Ling Chen;Chia-Ping Tsai;Hung-Chang Hsiao
Author_Institution :
Dept. of Comput. Sci. &
Abstract :
R is a popular programming language which is widely adopted by data scientists. However, typical R can only be executed in a single machine environment. Although R can be linked to Hadoop such as RHadoop, R users need to develop their R scripts based on the MapReduce framework. This demands highly skill of R programmers to parallelize their R pro-grams in terms of Map and Reduce jobs, killing the motivation of performing R computation in distributed environments out-pacing the single machine capacity. We present an implementation for parallelizing R in Hadoop in this paper. Our objective is to allow R users to run their R scripts, which are developed in a single machine environment, in Hadoop without modification. While this research work is still ongoing, we report our preliminary experiences in this paper on how to hide the complexity of migrating and running such R scripts in Hadoop.
Keywords :
"Java","Big data","Proposals","Databases","Distributed computing","Semantics","Grammar"
Conference_Titel :
Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on
DOI :
10.1109/SmartCity.2015.218