مرکز منطقه ای اطلاع رساني علوم و فناوري - RABID: A Distributed Parallel R for Large Datasets

DocumentCode :

249487

Title :

RABID: A Distributed Parallel R for Large Datasets

Author :

Hao Lin ; Shuo Yang ; Midkiff, Samuel P.

Author_Institution :

Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA

fYear :

2014

fDate :

June 27 2014-July 2 2014

Firstpage :

725

Lastpage :

732

Abstract :

Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This paper describes highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReducelike distributed Spark and achieves high performance and scaling across clusters. Our experimental evaluation shows that RABID performs up to 5x faster than Hadoop and 20x faster than RHIPE on two data mining applications.

Keywords :

data mining; distributed processing; statistical analysis; MapReducelike distributed spark; RABID; data analysis; data mining; distributed parallel R; enterprise applications; large datasets; scientific applications; single threaded execution model; statistical languages; Data structures; Distributed databases; Fault tolerance; Fault tolerant systems; Programming; Servers; Sparks; Big Data analytics; Data mining; Distributed Computing; R;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (BigData Congress), 2014 IEEE International Congress on

Conference_Location :

Anchorage, AK

Print_ISBN :

978-1-4799-5056-0

Type :

conf

DOI :

10.1109/BigData.Congress.2014.107

Filename :

6906850

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=249487