مرکز منطقه ای اطلاع رساني علوم و فناوري - Keynote address: Divide and Recombine: An approach for analyzing large datasets

DocumentCode :

2487106

Title :

Keynote address: Divide and Recombine: An approach for analyzing large datasets

Author :

Hanrahan, P.

fYear :

2012

fDate :

14-15 Oct. 2012

Firstpage :

Lastpage :

Abstract :

Summary form only given. Analyzing large datasets is often difficult because systems and algorithms do not scale. Even routine processing tasks are difficult to run and may take a long time. Many common analytical algorithms cannot be applied to large datasets because they are either superlinear in time or space. In this talk I will describe our approach for analyzing large datasets that we call Divide and Recombine (D&R). D&R is built using RHIPE, a system that runs parallel R map-reduce jobs using Hadoop. We use D&R to run virtual experiments over large datasets. In a virtual experiment, we sample the data using a technique from experimental design, we then analyze the results of that experiment, and finally combine all the experiments into a single result. This is joint work with Bill Cleveland.

Keywords :

data analysis; parallel processing; D&R datasets; Hadoop; RHIPE; analytical algorithms; divide and recombine datasets; large dataset analysis; parallel R map-reduce jobs; routine processing tasks; virtual experiments;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Large Data Analysis and Visualization (LDAV), 2012 IEEE Symposium on

Conference_Location :

Seattle, WA

Print_ISBN :

978-1-4673-4732-7

Type :

conf

DOI :

10.1109/LDAV.2012.6378969

Filename :

6378969

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2487106