Title :
HEDS: Hybrid deduplication approach for email servers
Author :
Kim, Daehee ; Choi, Baek-Young
Author_Institution :
Dept. Comput. Sci. Electr. Eng., Univ. of Missouri-Kansas City, Kansas City, MO, USA
Abstract :
While the volume of emails are ever-increasing, email servers tend to have large redundancies demanding huge storage unnecessarily. MS exchange single instance (SIS) attempts storage-efficiency by removing redundant copies of an entire email-level when it is destined for multiple recipients´ mailboxes within a server. However, redundant attachments or parts of emails due to threads are left storage-inefficient by SIS. On the other hand, existing deduplication systems are mainly for backup systems, and execute deduplication in block-level that makes it difficult to be used in in-line systems. We propose a hybrid scheme that adaptively performs deduplication at the granularity of either the file-level or chunk-level. We have designed and implemented a hybrid email deduplicaiton system, named HEDS, and evaluated it with real email datasets.We show that it achieves a high data reduction rate while keeping the CPU and memory overhead small;thus, it is feasible to be used as an in-line system on email servers.
Keywords :
electronic mail; file servers; CPU; HEDS; MS exchange single instance; SIS; data reduction rate; email servers; hybrid deduplication approach; hybrid email deduplicaiton system; in-line system; in-line systems; memory overhead; multiple recipients mailboxes; redundant attachments; storage-efficiency; Arrays; Electronic mail; Indexes; Postal services; Redundancy; Servers;
Conference_Titel :
Ubiquitous and Future Networks (ICUFN), 2012 Fourth International Conference on
Conference_Location :
Phuket
Print_ISBN :
978-1-4673-1377-3
Electronic_ISBN :
2165-8528
DOI :
10.1109/ICUFN.2012.6261672