Title :
Constrained Skyline Query Processing against Distributed Data Sites
Author :
Chen, Lijiang ; Bin Cui ; Lu, Hua
Author_Institution :
Key Lab. of High Confidence Software Technol., Peking Univ., Beijing, China
Abstract :
The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper, we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result. We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics are proposed to guide the selection of a given number of filtering points from a superset. A cost-efficient model is developed to determine how many filtering points to use for a particular data site. The results of an extensive experimental study demonstrate that our proposals are effective and efficient.
Keywords :
distributed databases; information filtering; optimisation; query processing; PaDSkyline; constrained skyline query processing; cost-efficient model; distributed data sites; filtering points; geographically scattered sites; intragroup optimization; large-scale unstructured distributed environment; multidimensional point set; multifiltering technique; network connections; novel algorithm framework; parallel skyline query processing; partition algorithm; query response time; relevant data; skyline computations; skyline query processes; unqualified local skyline points; Concurrent computing; Distributed computing; Distributed processing; Filtering; Information filters; Large-scale systems; Partitioning algorithms; Query processing; Scattering; Constrained skyline query; distributed query processing.; filtering point;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2010.103