Title :
Enhancing Document Exploration with OLAP
Author :
Chen, Zhibo ; Garcia-Alvarado, Carlos ; Ordonez, Carlos
Author_Institution :
Univ. of Houston, Houston, TX, USA
Abstract :
Finding relevant documents in digital libraries has been a well studied problem in information retrieval. It is not uncommon to see users browsing digital collections without having a clear idea of the keyword search that they should perform. However, we believe that such initial query search is not totally independent from the target search. Therefore, we use these initial document selections to further explore these documents. In the following demonstration, we exploit On-line Analytical Processing (OLAP) for knowledge discovery in digital collections to achieve query refinement. Such refinement is the result of applying a traditional ranking technique, based on the vector space model, selecting the top keywords in the resulting subset of documents, and then displaying certain cuboids of the keywords. Based on these cuboids, which are ranked by their frequency, the users can select a query that can better represent their actual target search. We show that this document exploration can be done efficiently within the DBMS and exploit in-database extensions, such as User-Defined Functions, as well as standard SQL. Additionally, we demonstrate a novel approach to obtaining query refinement through OLAP data cubes.
Keywords :
SQL; data mining; digital libraries; document handling; information retrieval; query processing; search problems; DBMS; OLAP data cube; digital library; document selection; enhancing document exploration; in-database extension; information retrieval; keyword search; knowledge discovery; online analytical processing; query refinement; query search; ranking technique; standard SQL; user browsing; user defined function; vector space model; Information Retrieval; OLAP; UDF;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.37