DocumentCode :
688231
Title :
Exploiting Fingerprint Prefetching to Improve the Performance of Data Deduplication
Author :
Liangshan Song ; Yuhui Deng ; Junjie Xie
Author_Institution :
Dept. of Comput. Sci., Jinan Univ., Guangzhou, China
fYear :
2013
fDate :
13-15 Nov. 2013
Firstpage :
849
Lastpage :
856
Abstract :
Data deduplication has become an important and economic way to remove the redundant data segments, thus alleviating the pressure incurred by large amounts of data need to store. Fingerprints are used to represent and identify identical data blocks when performing data deduplication. However, the amount of fingerprints grows with the increase of data. Due to the limited memory size, the fingerprints have to be stored in disk drives. When the fingerprints are not satisfied in memory, disk I/Os will be generated to obtain the on-disk fingerprints. This results in small and random I/Os, thus significantly degrading the performance of data deduplication. This paper introduces a fingerprint prefetching algorithm by leveraging file similarity and data locality. On the one hand, we present a similar file recognition algorithm to identify the similar files that are considered to have some modifications and share a large portion of identical data blocks. On the other hand, the on-disk fingerprints are organized according to the sequence of data streams, thus maintaining the data locality to improve the cache hit ratio. The proposed prefetching algorithm will request fingerprints from disk drives and place them in memory before they are actually needed. This will significantly improve the cache hit ratio when the fingerprints are actually needed, thus enhancing the performance of data deduplication. Two real data sets that represent typical cloud storage and cloud computing scenarios are collected to evaluate the effectiveness of the proposed approach.
Keywords :
cache storage; cloud computing; data handling; disc drives; fingerprint identification; storage management; cache hit ratio; cloud computing scenarios; cloud storage; data deduplication; data locality; data sets; data streams; disk I/O; disk drives; file recognition algorithm; file similarity; fingerprint prefetching algorithm; identical data blocks; limited memory size; on-disk fingerprints; random I/O; redundant data segments; Cloud computing; Data compression; Fingerprint recognition; Indexes; Prefetching; Random access memory; Throughput; deduplication; disk bottleneck; file similarity; fingerprint prefetching; locality;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
Conference_Location :
Zhangjiajie
Type :
conf
DOI :
10.1109/HPCC.and.EUC.2013.122
Filename :
6832004
Link To Document :
بازگشت