• DocumentCode
    2080430
  • Title

    Intensional associations in dataspaces

  • Author

    Salles, Marcos Antonio Vaz ; Dittrich, Jens ; Blunschi, Lukas

  • Author_Institution
    Cornell Univ., Ithaca, NY, USA
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    984
  • Lastpage
    987
  • Abstract
    Dataspace applications necessitate the creation of associations among data items over time. For example, once information about people is extracted from sources on the Web, associations among them may emerge as a consequence of different criteria, such as their city of origin or their elected hobbies. In this paper, we advocate a declarative approach to specifying these associations. We propose that each set of associations be defined by an association trail. An association trail is a query-based definition of how items are connected by intensional (i.e., virtual) association edges to other items in the dataspace. We study the problem of processing neighborhood queries over such intensional association graphs. The naive approach to neighborhood query processing over intensional graphs is to materialize the whole graph and then apply previous work on dataspace graph indexing to answer queries. We present in this paper a novel indexing technique, the grouping-compressed index (GCI), that has better worst-case indexing cost than the naive approach. In our experiments, GCI is shown to provide an order of magnitude gain in indexing cost over the naive approach, while remaining competitive in query processing time.
  • Keywords
    data mining; database indexing; database management systems; query processing; Web; association trail; dataspace graph indexing; dataspaces applications; grouping compressed index; indexing technique; information extraction; intensional association edges; intensional association graphs; intensional associations; magnitude gain; naive approach; neighborhood query processing; query processing time; worst case indexing cost; Cities and towns; Content management; Costs; Data mining; Indexing; Information management; Query processing; Universal Serial Bus;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-5445-7
  • Electronic_ISBN
    978-1-4244-5444-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2010.5447833
  • Filename
    5447833