DocumentCode
2235954
Title
Fault Tolerance of Tornado Codes for Archival Storage
Author
Woitaszek, Matthew ; Tufo, Henry M.
Author_Institution
Colorado Univ., Boulder, CO
fYear
0
fDate
0-0 0
Firstpage
83
Lastpage
92
Abstract
This paper examines a class of low density parity check (LDPC) erasure codes called Tornado codes for applications in archival storage systems. The fault tolerance of Tornado code graphs is analyzed and it is shown that it is possible to identify and mitigate worst-case failure scenarios in small (96 node) graphs through use of simulations to find and eliminate critical node sets that can cause Tornado codes to fail even when almost all blocks are present. The graph construction procedure resulting from the preceding analysis is then used to construct a 96-device Tornado code storage system with capacity overhead equivalent to RAID 10 that tolerates any 4 device failures. This system is demonstrated to be superior to other parity-based RAID systems. Finally, it is described how a geographically distributed data stewarding system can be enhanced by using cooperatively selected Tornado code graphs to obtain fault tolerance exceeding that of its constituent storage sites or site replication strategies
Keywords
RAID; fault tolerant computing; graph theory; parity check codes; storage management; LDPC erasure code; Tornado code graph; archival storage system; distributed data stewarding system; fault tolerance; low density parity check; parity-based RAID system; Analytical models; Availability; Failure analysis; Fault diagnosis; Fault tolerance; Fault tolerant systems; Information retrieval; Parity check codes; Throughput; Tornadoes;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Distributed Computing, 2006 15th IEEE International Symposium on
Conference_Location
Paris
ISSN
1082-8907
Print_ISBN
1-4244-0307-3
Type
conf
DOI
10.1109/HPDC.2006.1652139
Filename
1652139
Link To Document