DocumentCode :
710108
Title :
Ranking Candidate Networks of relations to improve keyword search over relational databases
Author :
de Oliveira, Pericles ; da Silva, Altigran ; de Moura, Edleno
Author_Institution :
Inst. de Comput., Univ. Fed. do Amazonas, Manaus, Brazil
fYear :
2015
fDate :
13-17 April 2015
Firstpage :
399
Lastpage :
410
Abstract :
Relational keyword search (R-KwS) systems based on schema graphs take the keywords from the input query, find the tuples and tables where these keywords occur and look for ways to “connect” these keywords using information on referential integrity constraints, i.e., key/foreign key pairs. The result is a number of expressions, called Candidate Networks (CNs), which join relations where keywords occur in a meaningful way. These CNs are then evaluated, resulting in a number of join networks of tuples (JNTs) that are presented to the user as ranked answers to the query. As the number of CNs is potentially very high, handling them is very demanding, both in terms of time and resources, so that, for certain queries, current systems may take too long to produce answers, and for others they may even fail to return results (e.g., by exhausting memory). Moreover, the quality of the CN evaluation may be compromised when a large number of CNs is processed. Based on observations made by other researchers and in our own findings on representative workloads, we argue that, although the number of possible Candidate Networks can be very high, only very few of them produce answers relevant to the user and are indeed worth processing. Thus, R-KwS systems can greatly benefit from methods for accessing the relevance of Candidate Networks, so that only those deemed relevant might be evaluated. We propose in this paper an approach for ranking CNs, based on their probability of producing relevant answers to the user. This relevance is estimated based on the current state of the underlying database using a probabilistic Bayesian model we have developed. Experiments that we performed indicate that this model is able to assign the relevant CNs among the top-4 in the ranking produced. In these experiments we also observed that processing only a few relevant CNs has a considerable positive impact, not only on the performance of processing keyword queries, but also on the quali- y of the results obtained.
Keywords :
Bayes methods; graph theory; probability; query processing; relational databases; CN evaluation; JNTs; R-KwS systems; candidate network ranking; input query; join networks of tuples; keyword query processing; probabilistic Bayesian model; referential integrity constraints; relational databases; relational keyword search system; schema graphs; Algebra; Bayes methods; Indexes; Joints; Probabilistic logic; Relational databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
Type :
conf
DOI :
10.1109/ICDE.2015.7113301
Filename :
7113301
Link To Document :
بازگشت