مرکز منطقه ای اطلاع رساني علوم و فناوري - Shared disk big data analytics with Apache Hadoop

DocumentCode :

2038175

Title :

Shared disk big data analytics with Apache Hadoop

Author :

Mukherjee, Arjun ; Datta, Jishnu ; Jorapur, R. ; Singhvi, R. ; Haloi, S. ; Akram, W.

Author_Institution :

Symantec Corp. ICON, Pune, India

fYear :

2012

fDate :

18-22 Dec. 2012

Firstpage :

Lastpage :

Abstract :

Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper., we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don´t need a horizontal., internet order scalability in their analytics platform., Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study., we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort., DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS., but also outperformed in many cases. This way., enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness., a rich set of features and compatibility with traditional analytics applications.

Keywords :

data analysis; storage management; Apache Hadoop; DFS-IO; Gridmix; HDFS; Hadoop Distributed File System; Internet scale scalability; Map-reduce benchmark; POSIX Cluster File System; SF-CFS; Terasort; VERITAS Cluster File System; clustered file system; data capture; data center; data management; data processing; data storage; massively scalable distributed computing platform; shared disk Big Data analytics; shared storage model; Analytics; BigData; Cloud; Clustered File Systems; Hadoop;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing (HiPC), 2012 19th International Conference on

Conference_Location :

Pune

Print_ISBN :

978-1-4673-2372-7

Electronic_ISBN :

978-1-4673-2370-3

Type :

conf

DOI :

10.1109/HiPC.2012.6507520

Filename :

6507520

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2038175