DocumentCode :
2038175
Title :
Shared disk big data analytics with Apache Hadoop
Author :
Mukherjee, Arjun ; Datta, Jishnu ; Jorapur, R. ; Singhvi, R. ; Haloi, S. ; Akram, W.
Author_Institution :
Symantec Corp. ICON, Pune, India
fYear :
2012
fDate :
18-22 Dec. 2012
Firstpage :
1
Lastpage :
6
Abstract :
Big Data is a term applied to data sets whose size is beyond the ability of traditional software technologies to capture, store, manage and process within a tolerable elapsed time. The popular assumption around Big Data analytics is that it requires internet scale scalability: over hundreds of compute nodes with attached storage. In this paper., we debate on the need of a massively scalable distributed computing platform for Big Data analytics in traditional businesses. For organizations which don´t need a horizontal., internet order scalability in their analytics platform., Big Data analytics can be built on top of a traditional POSIX Cluster File Systems employing a shared storage model. In this study., we compared a widely used clustered file system: VERITAS Cluster File System (SF-CFS) with Hadoop Distributed File System (HDFS) using popular Map-reduce benchmarks like Terasort., DFS-IO and Gridmix on top of Apache Hadoop. In our experiments VxCFS could not only match the performance of HDFS., but also outperformed in many cases. This way., enterprises can fulfill their Big Data analytics need with a traditional and existing shared storage model without migrating to a different storage model in their data centers. This also includes other benefits like stability & robustness., a rich set of features and compatibility with traditional analytics applications.
Keywords :
data analysis; storage management; Apache Hadoop; DFS-IO; Gridmix; HDFS; Hadoop Distributed File System; Internet scale scalability; Map-reduce benchmark; POSIX Cluster File System; SF-CFS; Terasort; VERITAS Cluster File System; clustered file system; data capture; data center; data management; data processing; data storage; massively scalable distributed computing platform; shared disk Big Data analytics; shared storage model; Analytics; BigData; Cloud; Clustered File Systems; Hadoop;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2012 19th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-2372-7
Electronic_ISBN :
978-1-4673-2370-3
Type :
conf
DOI :
10.1109/HiPC.2012.6507520
Filename :
6507520
Link To Document :
بازگشت