DocumentCode :
569050
Title :
Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets
Author :
Nam, Young Jin ; Park, Dongchul ; Du, David H C
Author_Institution :
Sch. of Comput. & Inf. Technol., Daegu Univ., Gyeongsan, South Korea
fYear :
2012
fDate :
7-9 Aug. 2012
Firstpage :
201
Lastpage :
208
Abstract :
Data deduplication has been widely adopted in contemporary backup storage systems. It not only saves storage space considerably, but also shortens the data backup time significantly. Since the major goal of the original data deduplication lies in saving storage space, its design has been focused primarily on improving write performance by removing as many duplicate data as possible from incoming data streams. Although fast recovery from a system crash relies mainly on read performance provided by deduplication storage, little investigation into read performance improvement has been made. In general, as the amount of deduplicated data increases, write performance improves accordingly, whereas associated read performance becomes worse. In this paper, we newly propose a deduplication scheme that assures demanded read performance of each data stream while achieving its write performance at a reasonable level, eventually being able to guarantee a target system recovery time. For this, we first propose an indicator called cache aware Chunk Fragmentation Level (CFL) that estimates degraded read performance on the fly by taking into account both incoming chunk information and read cache effects. We also show a strong correlation between this CFL and read performance in the backup datasets. In order to guarantee demanded read performance expressed in terms of a CFL value, we propose a read performance enhancement scheme called selective duplication that is activated whenever the current CFL becomes worse than the demanded one. The key idea is to judiciously write non-unique (shared) chunks into storage together with unique chunks unless the shared chunks exhibit good enough spatial locality. We quantify the spatial locality by using a selective duplication threshold value. Our experiments with the actual backup datasets demonstrate that the proposed scheme achieves demanded read performance in most cases at the reasonable cost of write performance.
Keywords :
data compression; CFL; backup datasets; chunk fragmentation level; contemporary backup storage systems; data compression; data deduplication storage; data streams; spatial locality; storage space; system recovery time; Containers; Correlation; Educational institutions; Indexing; Monitoring; Throughput; data deduplication; read performance; storage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on
Conference_Location :
Washington, DC
ISSN :
1526-7539
Print_ISBN :
978-1-4673-2453-3
Type :
conf
DOI :
10.1109/MASCOTS.2012.32
Filename :
6298180
Link To Document :
بازگشت