DocumentCode
1783311
Title
POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the Cloud
Author
Bo Mao ; Hong Jiang ; Suzhen Wu ; Lei Tian
Author_Institution
Software Sch., Xiamen Univ., Xiamen, China
fYear
2014
fDate
19-23 May 2014
Firstpage
767
Lastpage
776
Abstract
Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to the relatively high temporal access locality associated with small I/O requests to redundant data. On the other hand, we also observe that directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a Performance-Oriented I/O Deduplication approach, called POD, rather than a capacity-oriented I/O deduplication approach, represented by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. The salient feature of POD is its focus on not only the capacity-sensitive large writes and files, as in iDedup, but also the performance-sensitive while capacity-insensitive small writes and files. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/O performance measure by up to 87.9% with an average of 58.8%. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup.
Keywords
cloud computing; data handling; redundancy; storage management; I/O path intensity; POD; capacity-oriented I/O deduplication approach; cloud computing; data deduplication; data fragmentation; high data redundancy; high temporal access locality; iDedup; performance-oriented I/O deduplication approach; primary storage systems; space contention; Cache storage; Educational institutions; Indexes; Memory management; Monitoring; Prototypes; Redundancy; Data Redundancy; I/O Deduplication; I/O Performance; Primary Storage; Storage Capacity;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.84
Filename
6877308
Link To Document