Title :
A hierarchical clustering method for big data oriented ciphertext search
Author :
Chi Chen ; Xiaojie Zhu ; Peisong Shen ; Jiankun Hu
Author_Institution :
State Key Lab. of Inf. Security, Inst. of Inf. Eng., Beijing, China
fDate :
April 27 2014-May 2 2014
Abstract :
Following the wide use of cloud services, the volume of data stored in the data center has experienced a dramatically growth which makes real-time information retrieval much more difficult than before. Furthermore, text information is usually encrypted before being outsourced to data centers in order to protect users´ data privacy. Current techniques to search on encrypted data do not perform well within such a massive data environment. In this paper, a hierarchical clustering method for ciphertext search within a big data environment is proposed. The proposed approach clusters the documents based on the minimum similarity threshold, and then partitions the resultant clusters into sub-clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against exponential size of document collection. In addition, retrieved documents have a better relationship with each other than traditional methods. An experiment has been conducted using the collection set built from the recent ten years´ IEEE INFOCOM publications, including about 3000 documents with nearly 5300 keywords. The results have validated our proposed approach.
Keywords :
Big Data; cloud computing; computational complexity; cryptography; data privacy; information retrieval; pattern clustering; text analysis; Big Data oriented ciphertext search; IEEE INFOCOM publications; cloud services; data center; document clustering; document collection; document retrieval; encrypted data; hierarchical clustering method; linear computational complexity; minimum similarity threshold; real-time information retrieval; subclusters; text information encryption; user data privacy protection; Big data; Conferences; Cryptography; Equations; Indexes; Servers; Vectors; ciphertext retrieval; cloud computing; hierarchical clustering; multi-keyword ranked search;
Conference_Titel :
Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on
Conference_Location :
Toronto, ON
DOI :
10.1109/INFCOMW.2014.6849292