Building Highly Available Cluster File System Based on Replication

Author

Cao, Liang ; Wang, Yu ; Xiong, Jin

Author_Institution

Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China

fYear

2009

fDate

8-11 Dec. 2009

Firstpage

94

Lastpage

101

Abstract

In order to gain better cost-effectiveness, current large-scale storage systems are typically built up by thousands of individual components. As systems scale up, the probability of the failure of multiple components increases. And for large-scale storage system, failures are normal rather than exception. How to build file systems providing both high throughput and highly available service under such circumstances is a big challenge. We have designed and implemented HA-DCFS3, a highly available cluster file system prototype. It uses a scalable replication algorithm called asynchronous primary copy protocol (APCP). Unlike traditional primary copy protocol that must synchronize updates to all replicas, APCP introduces an asynchronous approach where write operation is permitted to be synchronized to a subset of replicas. This flexible approach greatly improves the write performance. Furthermore, HA-DCFS3 also introduces a fine-grained failure detection called Â¿ data path detectionÂ¿, which is integrated into the fault-tolerant framework based on data replication. Hence, HA-DCFS3 can provide continuous service even when component failures occur. And finally, HA-DCFS3 adopts a two-level data recovery strategy that handles transient failures with reintegration and persistent failures with re-replication respectively to reduce the cost of data repair. Our performance results show that HA-DCFS3 can deliver high and scalable aggregate performance and provide highly available service at very low cost.

Keywords

fault tolerant computing; storage management; HA-DCFS3; asynchronous primary copy protocol; cluster file system; cost-effectiveness; data path detection; fault-tolerant framework; fine-grained failure detection; large-scale storage system; scalable replication algorithm; Aggregates; Clustering algorithms; Costs; Fault detection; Fault tolerance; File systems; Large-scale systems; Protocols; Prototypes; Throughput; data availability; data replication; fault-tolerance; performance; primary copy;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Computing, Applications and Technologies, 2009 International Conference on

Conference_Location

Higashi Hiroshima

Print_ISBN

978-0-7695-3914-0

Type

conf

DOI

10.1109/PDCAT.2009.14

Filename

5372817