DocumentCode :
125418
Title :
Effectiveness Assessment of Solid-State Drive Used in Big Data Services
Author :
Wei Tan ; Fong, Liana ; Yanbin Liu
Author_Institution :
T.J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
393
Lastpage :
400
Abstract :
Big data poses challenges to the technologies required to process data of high volume, velocity, variety, and veracity. Among the challenges, the storage and computing required by big data analytics is usually huge, and as a result big data capabilities are often provisioned in cloud and delivered in the form of Web-based services. Solid-state drive (SSD) is widely used nowadays as an elementary hardware feature in cloud infrastructure for big data services. For example, Amazon Web Service (AWS) offers EC2 instances with SSD storage, and its key-value data store, DynamoDB, is backed up by SSD for superior performance. Compared to hard disk drive (HDD), SSD prevails in both access latency and bandwidth. In the foreseeable future, SSD would be readily available on commodity servers though its capacity would be neither large enough nor cost effective to accommodate big data on its own. Therefore, it is essential to investigate how to efficiently leverage SSD as one layer in a storage hierarchy in addition to HDD. In this paper, we investigate the effectiveness of using SSD in three workloads, namely standalone Hadoop MapReduce jobs, Hive jobs, and HBase queries. Firstly, we device an approach to enable Hadoop Distributed File System (HDFS) having a SSD-HDD storage hierarchy. Secondly, we investigate the IO involved in different phases of Hadoop jobs and design different schemes to place data discriminatively in the aforementioned storage hierarchy. Afterward, the effectiveness of different schemes are evaluated with respect to job run time. Finally, we summarize best practices of data placement for examined workloads in a SSD-HDD storage hierarchy.
Keywords :
Big Data; disc drives; hard discs; parallel programming; storage management; AWS; Amazon Web service; Big Data analytics; Big Data services; HBase queries; HDD; HDFS; Hadoop MapReduce jobs; Hadoop distributed file system; Hive jobs; SSD; SSD-HDD storage hierarchy; cloud infrastructure; cloud provisioning; hard disk drive; solid-state drive; Benchmark testing; Big data; Distributed databases; Indexes; Servers; Throughput; HBase; Hadoop; big data; solid-state drive;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Services (ICWS), 2014 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5053-9
Type :
conf
DOI :
10.1109/ICWS.2014.63
Filename :
6928923
Link To Document :
بازگشت