DocumentCode :
1446994
Title :
Efficient Keyword-Based Search for Top-K Cells in Text Cube
Author :
Ding, Bolin ; Zhao, Bo ; Lin, Cindy Xide ; Han, Jiawei ; Zhai, ChengXiang ; Srivastava, Ashok ; Oza, Nikunj C.
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Volume :
23
Issue :
12
fYear :
2011
Firstpage :
1795
Lastpage :
1810
Abstract :
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.
Keywords :
query languages; query processing; relational databases; text analysis; IR-style relevance model; RDBMS; bottom-up dynamic programming; document sorted-scan; free-form keyword queries; inverted-index one-scan; keyword search; keyword-based query language; keyword-based search; linked structures; multidimensional text database; search-space ordering; text cube; top-k cells; Computational modeling; Data models; Indexes; Information retrieval; Keyword search; Portable computers; Text analysis; Keyword search; data cube.; multidimensional text data;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.34
Filename :
5710919
Link To Document :
بازگشت