Title of article :
TJJE: An efficient algorithm for top-k join on massive data
Author/Authors :
Xixian Han، نويسنده , , Jianzhong Li، نويسنده , , Jinbao Wang، نويسنده , , Donghua Yang، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
In many applications, top-k join is an important operation to return the k most important join tuples among the potentially huge answer space according to a given ranking function. PBRJ is an algorithm template that generalizes previous top-k join algorithms. In this paper, our analysis shows that PBRJ needs to maintain a large quantity of candidate tuples on massive data. Based on the analysis, this paper proposes a novel top-k join algorithm TJJE which is suitable for handling massive data. By some pre-computed information, TJJE first estimates an upper-bound on scan depth of each joined table. Then it determines the file that contains the join positional index pairs of the top-k join results. A novel algorithm is proposed to retrieve the required join tuples by a single sequential and selective scan on the joined tables. Finally, the top-k join results are obtained by a single scan on the retrieved join tuples. The correctness proof and cost analysis of TJJE are presented in this paper. Extensive experiments show that TJJE maintains up to three orders of magnitude fewer candidate tuples and obtains up to one order of magnitude speedup compared to PBRJ.
Keywords :
Top-k join , massive data , Scan depth , DATABASE , TJJE
Journal title :
Information Sciences
Journal title :
Information Sciences