Author :
Baid, Akash ; Rae, I. ; AnHai Doan ; Naughton, J.F.
Author_Institution :
Comput. Sci. Dept., Univ. of Wisconsin, Madison, WI, USA
Abstract :
Keyword search (KWS) over relational data, where the answers are multiple tuples connected via joins, has received significant attention in the past decade. Numerous solutions have been proposed and many prototypes have been developed. Building on this rapid progress and on growing user needs, recently several RDBMS and Web companies as well as academic research groups have started to examine how to build industrial-strength keywords search systems. This task clearly requires addressing many issues, including robustness, accuracy, reliability, and privacy, among others. A major emerging issue, however, appears to be performance related: current KWS systems have unpredictable run time. In particular, for certain queries it takes too long to produce answers, and for others the system may even fail to return (e.g., after exhausting memory). In this paper we begin by examining the above problem and arguing that it is a fundamental problem unlikely to be solved in the near future by software and hardware advances. Next, we argue that in an industrial-strength setting, to ensure real-time interaction and facilitate user adoption, KWS systems should produce answers under an absolute time limit and then provide users with a description of what could be done next, should he or she choose to continue. Next, we show how to realize these requirements for DISCOVER, an exemplar of a recent KWS solution approach. Our basic idea is to produce answers as in today´s KWS systems up to the time limit, then show users these answers as well as query forms that characterize the unexplored portion of the answer space. Finally, we present some preliminary experiments over real-world data to demonstrate the feasibility of the proposed solution approach.
Keywords :
relational databases; search engines; DISCOVER solution approach; accuracy issue; industrial-strength setting; keyword search systems; privacy issue; relational data; reliability issue; robustness issue; Computer industry; Computer science; Hardware; Industrial relations; Keyword search; Privacy; Prototypes; Real time systems; Relational databases; Robustness;