Title :
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
Author :
Zhang, Jing ; Wu, Gongqing ; Hu, Xuegang ; Wu, Xindong
Author_Institution :
Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
Abstract :
The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.
Keywords :
cache storage; cloud computing; disc storage; distributed databases; distributed shared memory systems; peer-to-peer computing; software libraries; table lookup; HDCache; HDFS-based distributed cache system; Hadoop distributed file system; P2P style; access layer; cache service; client library; disk; distributed hash table; distributed layered cache system; file access performance; hardware; in-memory cache; network environment; real-time cloud service; shared memory; software; Cloud computing; Data models; File systems; Libraries; Random access memory; Real-time systems; Servers; HDFS; cloud storage; distributed cache system; in-memory cloud; real-time file acces;
Conference_Titel :
Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2901-9
DOI :
10.1109/Grid.2012.17