Title :
Data De-duplication on Similar File Detection
Author :
Yueguang Zhu ; Xingjun Zhang ; Runting Zhao ; Xiaoshe Dong
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´an Jiaotong Univ., Xi´an, China
Abstract :
At present, there exist many bottlenecks in block level data de-duplication on the metadata management and read/write rate. In order to achieve higher de-duplication elimination ratio, the traditional way is to expand the range of data for data de-duplication, but that would make metadata fields longer and increase the number of metadata entries. When detecting the redundant data, metadata needs to be constantly imported and exported into the memory and access bottleneck will be produced. So it is necessary to detect similar documents to classify valuable data for de-duplication. In this paper, we propose a new method of block-level data de-duplication combined with similar file detection. At the time of guaranteeing the de-duplication elimination ratio, we narrow the range of data to reduce the metadata and eliminate performance bottlenecks. We present a detailed evaluation of our method and other existing data deduplication methods, and we show that our method meets its design goals as it improves the de-duplication ratio while reducing overhead costs.
Keywords :
Big Data; meta data; pattern classification; block level data deduplication; metadata management; redundant data; similar document detection; similar file detection; valuable data classification; Acceleration; Arrays; Big data; Partitioning algorithms; Servers; Web and internet services; data de-duplication; de-duplication elimination ratio; metadata; similar file detection;
Conference_Titel :
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on
Conference_Location :
Birmingham
Print_ISBN :
978-1-4799-4333-3
DOI :
10.1109/IMIS.2014.9