Automatic recovery from disk failure in continuous-media servers

Author

Lee, Jack Y B ; Lui, John C S

Author_Institution

Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China

Volume

13

Issue

5

fYear

2002

fDate

5/1/2002 12:00:00 AM

Firstpage

499

Lastpage

515

Abstract

Continuous-media (CM) servers have been around for some years. Apart from server capacity, another important issue in the deployment of CM servers is reliability. This study investigates rebuild algorithms for automatically rebuilding data stored in a failed disk into a spare disk. Specifically, a block-based rebuild algorithm is studied with the rebuild time and buffer requirement modeled. A buffer-sharing scheme is then proposed to eliminate the additional buffers needed by the rebuild process. To further improve rebuild performance, a track-based rebuild algorithm that rebuilds lost data in tracks is proposed and analyzed. Results show that track-based rebuild, while it substantially outperforms block-based rebuild, requires significantly more buffers (17-135 percent more) even with buffer sharing. To tackle this problem, a novel pipelined rebuild algorithm is proposed to take advantage of the sequential property of track retrievals to pipeline the reading and writing processes. This pipelined rebuild algorithm achieves the same rebuild performance as track-based rebuild, but reduces the extra buffer requirement to insignificant levels (0.7-1.9 percent). Numerical results computed using models of five commercial disk drives demonstrate that automatic rebuild of a failed disk can be done in a reasonable amount of time, even at relatively high server utilization (e.g., less than 1.5 hours at 90 percent utilization)

Keywords

buffer storage; fault tolerant computing; multimedia computing; multimedia servers; pipeline processing; automatic disk failure recovery; block-based rebuild algorithm; buffer requirement; buffer-sharing scheme; continuous media servers; disk drives; pipelined rebuild algorithm; reading processes; rebuild time; reliability; spare disk; track retrievals; track-based rebuild algorithm; writing processes; Algorithm design and analysis; Degradation; Disk drives; Fault tolerance; Information retrieval; Lifting equipment; Performance analysis; Pipelines; Streaming media; Writing;

fLanguage

English

Journal_Title

Parallel and Distributed Systems, IEEE Transactions on

Publisher

ieee

ISSN

1045-9219

Type

jour

DOI

10.1109/TPDS.2002.1003860

Filename

1003860