DocumentCode
3172603
Title
Effect of Latent Errors on the Reliability of Data Storage Systems
Author
Venkatesan, V. ; Iliadis, Ilias
Author_Institution
IBM Res. - Zurich, Ruschlikon, Switzerland
fYear
2013
fDate
14-16 Aug. 2013
Firstpage
293
Lastpage
297
Abstract
The reliability of data storage systems is adversely affected by the presence of latent sector errors. As the number of occurrences of such errors increases with the storage capacity, latent sector errors have become more prevalent in today´s high capacity storage devices. Such errors are typically not detected until an attempt is made to read the affected sectors. When a latent sector error is detected, the redundant data corresponding to the affected sector is used to recover its data. However, if no such redundant data is available, then the data of the affected sector is irrecoverably lost from the storage system. Therefore, the reliability of data storage systems is affected by both the complete failure of storage nodes and the latent sector errors within them. In this article, closed-form expressions for the mean time to data loss (MTTDL) of erasure coded storage systems in the presence of latent errors are derived. The effect of latent errors on systems with various types of redundancy, data placement, and sector error probabilities is studied. For small latent sector error probabilities, it is shown that the MTTDL is reduced by a factor that is independent of the number of parities in the data redundancy scheme as well as the number of nodes in the system. However, for large latent sector error probabilities, the MTTDL is similar to that of a system using a data redundancy scheme with one parity less. The reduction of the MTTDL in the latter case is more pronounced than in the former one.
Keywords
error statistics; redundancy; storage management; system recovery; MTTDL; closed-form expressions; data placement; data redundancy scheme; data storage system reliability; erasure coded storage systems; high capacity storage devices; latent sector error probabilities; mean time to data loss; storage capacity; storage node failure; Analytical models; Computational modeling; Computers; Data models; Data storage systems; Redundancy; codeword placement; data storage; declustered; latent errors; reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 IEEE 21st International Symposium on
Conference_Location
San Francisco, CA
ISSN
1526-7539
Type
conf
DOI
10.1109/MASCOTS.2013.38
Filename
6730773
Link To Document