Title :
Automatic recovery from disk failure in continuous-media servers
Author :
Lee, Jack Y B ; Lui, John C S
Author_Institution :
Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
fDate :
5/1/2002 12:00:00 AM
Abstract :
Continuous-media (CM) servers have been around for some years. Apart from server capacity, another important issue in the deployment of CM servers is reliability. This study investigates rebuild algorithms for automatically rebuilding data stored in a failed disk into a spare disk. Specifically, a block-based rebuild algorithm is studied with the rebuild time and buffer requirement modeled. A buffer-sharing scheme is then proposed to eliminate the additional buffers needed by the rebuild process. To further improve rebuild performance, a track-based rebuild algorithm that rebuilds lost data in tracks is proposed and analyzed. Results show that track-based rebuild, while it substantially outperforms block-based rebuild, requires significantly more buffers (17-135 percent more) even with buffer sharing. To tackle this problem, a novel pipelined rebuild algorithm is proposed to take advantage of the sequential property of track retrievals to pipeline the reading and writing processes. This pipelined rebuild algorithm achieves the same rebuild performance as track-based rebuild, but reduces the extra buffer requirement to insignificant levels (0.7-1.9 percent). Numerical results computed using models of five commercial disk drives demonstrate that automatic rebuild of a failed disk can be done in a reasonable amount of time, even at relatively high server utilization (e.g., less than 1.5 hours at 90 percent utilization)
Keywords :
buffer storage; fault tolerant computing; multimedia computing; multimedia servers; pipeline processing; automatic disk failure recovery; block-based rebuild algorithm; buffer requirement; buffer-sharing scheme; continuous media servers; disk drives; pipelined rebuild algorithm; reading processes; rebuild time; reliability; spare disk; track retrievals; track-based rebuild algorithm; writing processes; Algorithm design and analysis; Degradation; Disk drives; Fault tolerance; Information retrieval; Lifting equipment; Performance analysis; Pipelines; Streaming media; Writing;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2002.1003860