DocumentCode :
3603672
Title :
HM: A Column-Oriented MapReduce System on Hybrid Storage
Author :
Sai Wu ; Gang Chen ; Ke Chen ; Feng Li ; Lidan Shou
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
Volume :
27
Issue :
12
fYear :
2015
Firstpage :
3304
Lastpage :
3317
Abstract :
The solid-state hybrid drive (SSHD) incorporates a small NAND flash memory into a hard drive, resulting in an integrated device with combined Hard Disk Drive (HDD ) and Solid State Disk (SSD) storage. By identifying the data highly associated with the performance and buffering them in the SSD part, SSHD can deliver a better performance than the standard hard drive. However, that requires a significant redesign for existing data processing systems. In this paper, we examine the problem of efficiently processing relational data using MapReduce on a cluster using SSHDs as the underlying storage devices. We present the design of Hybrid MapReduce (HM ), a column-oriented MapReduce system, which adopts different storage layout, query optimizer, data index, and compression algorithm from previous MapReduce systems. In HM, the Distributed File System (DFS ) is deployed on SSHDs, and data layout (how data chunks are disseminated to HDDs and SSDs) plays a key role for the performance. Hence, an approximate algorithm is used to tune the data layout adaptively to maximize the query performance. We evaluate HM using TPC-H benchmark and the results show that with our new design, the hybrid system can provide a similar performance as the SSD-only system.
Keywords :
data compression; data handling; disc drives; distributed databases; hard discs; network operating systems; parallel processing; query processing; DFS; HDD; HM; SSHD; TPC-H benchmark; column-oriented MapReduce system; compression algorithm; data chunks; data index; data layout; distributed file system; hard disk drive; hybrid MapReduce design; hybrid storage; query optimizer; relational data processing system; small NAND flash memory; solid state disk storage; solid-state hybrid drive; storage layout; Adaptation models; Data models; Distributed databases; Drives; Engines; Flash memories; Hard disks; Layout; Query processing; Cost Model; Hadoop; Index; MapReduce; Query Processing; SSHD; cost model; index; query processing;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2015.2453961
Filename :
7155542
Link To Document :
بازگشت