Title :
Intensional associations in dataspaces
Author :
Salles, Marcos Antonio Vaz ; Dittrich, Jens ; Blunschi, Lukas
Author_Institution :
Cornell Univ., Ithaca, NY, USA
Abstract :
Dataspace applications necessitate the creation of associations among data items over time. For example, once information about people is extracted from sources on the Web, associations among them may emerge as a consequence of different criteria, such as their city of origin or their elected hobbies. In this paper, we advocate a declarative approach to specifying these associations. We propose that each set of associations be defined by an association trail. An association trail is a query-based definition of how items are connected by intensional (i.e., virtual) association edges to other items in the dataspace. We study the problem of processing neighborhood queries over such intensional association graphs. The naive approach to neighborhood query processing over intensional graphs is to materialize the whole graph and then apply previous work on dataspace graph indexing to answer queries. We present in this paper a novel indexing technique, the grouping-compressed index (GCI), that has better worst-case indexing cost than the naive approach. In our experiments, GCI is shown to provide an order of magnitude gain in indexing cost over the naive approach, while remaining competitive in query processing time.
Keywords :
data mining; database indexing; database management systems; query processing; Web; association trail; dataspace graph indexing; dataspaces applications; grouping compressed index; indexing technique; information extraction; intensional association edges; intensional association graphs; intensional associations; magnitude gain; naive approach; neighborhood query processing; query processing time; worst case indexing cost; Cities and towns; Content management; Costs; Data mining; Indexing; Information management; Query processing; Universal Serial Bus;
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
DOI :
10.1109/ICDE.2010.5447833