DocumentCode
3537653
Title
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
Author
Zhang, Jing ; Wu, Gongqing ; Hu, Xuegang ; Wu, Xindong
Author_Institution
Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
fYear
2012
fDate
20-23 Sept. 2012
Firstpage
12
Lastpage
21
Abstract
The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.
Keywords
cache storage; cloud computing; disc storage; distributed databases; distributed shared memory systems; peer-to-peer computing; software libraries; table lookup; HDCache; HDFS-based distributed cache system; Hadoop distributed file system; P2P style; access layer; cache service; client library; disk; distributed hash table; distributed layered cache system; file access performance; hardware; in-memory cache; network environment; real-time cloud service; shared memory; software; Cloud computing; Data models; File systems; Libraries; Random access memory; Real-time systems; Servers; HDFS; cloud storage; distributed cache system; in-memory cloud; real-time file acces;
fLanguage
English
Publisher
ieee
Conference_Titel
Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on
Conference_Location
Beijing
ISSN
1550-5510
Print_ISBN
978-1-4673-2901-9
Type
conf
DOI
10.1109/Grid.2012.17
Filename
6319150
Link To Document