Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Abstract :
Cloud computing has emerged as a new type of commercial paradigm. As a typical cloud service, each file stored in the cloud is described with several keywords. By querying the cloud with certain keywords, a user can retrieve files whose keywords match his query. An organization that has thousands of users querying the cloud can set multiple proxy servers inside itself to reduce the querying cost. All users can be classified into different groups, and the users in a group will send their queries to the same proxy server, which will query the cloud with a combined query, i.e., the union of keywords in a group of queries. In such an environment, an important problem is cost efficiency, i.e., how to classify users into different groups so that the total number of returned files is minimized. Observing that this is mainly affected by the number of keywords in the combined queries, our problem is translated to classifying n users into k groups in the case of k proxy servers, so that the number of keywords in k combined queries is minimized. Since more common keywords in a group of queries will generate less keywords in the combined queries, we should group users with the most common keywords together. Two additional aspects needed to be addressed are load balancing and robustness, i.e., the workloads among proxy servers are balanced and each user obtains search results even if some proxy servers fail. To solve above problems simultaneously, we propose mathematic grouping and heuristic grouping strategies, where mathematic grouping solves the relaxed problem by using a local optimization method, and heuristic grouping is based on the classical heuristic clustering algorithm, K-means. Extensive evaluations have been conducted on the analytical model to verify the effectiveness of our strategies.
Keywords :
cloud computing; learning (artificial intelligence); optimisation; pattern clustering; query processing; user interfaces; K-means clustering algorithm; cloud computing; cloud query; cloud service; dynamic grouping strategy; file storage; heuristic clustering algorithm; heuristic grouping strategy; keyword; local optimization method; mathematic grouping strategy; proxy server; query group; querying cost resuction; user query; Bandwidth; Dictionaries; Educational institutions; Load management; Mathematics; Robustness; Servers; Cloud computing; cost efficiency; dynamic grouping; load balancing; robustness;