• Title of article

    Weighting tags and paths in XML documents according to their topic generalization

  • Author/Authors

    Dexi Liu، نويسنده , , Changxuan Wan، نويسنده , , Lei Chen، نويسنده , , Xiping Liu، نويسنده , , Jian-Yun Nie، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2013
  • Pages
    19
  • From page
    48
  • To page
    66
  • Abstract
    Text-centric (or document-centric) XML document retrieval aims to rank search results according to their relevance to a given query. To do this, most existing methods mainly rely on content terms and often ignore an important factor – the XML tags and paths, which are useful in determining the important contents of a document. In some previous studies, each unique tag/path is assigned a weight based on domain (expert) knowledge. However, such a manual assignment is both inefficient and subjective. In this paper, we propose an automatic method to infer the weights of tags/paths according to the topical relationship between the corresponding elements and the whole documents. The more the corresponding element can generalize the document’s topic, the more the tag/path is considered to be important. We define a model based on Average Topic Generalization (ATG), which integrates several features used in previous studies. We evaluate the performance of the ATG-based model on two real data sets, the IEEECS collection and the Wikipedia collection, from two different perspectives: the correlation between the weights generated by ATG and those set by experts, and the performance of XML retrieval based on ATG. Experimental results show that the tag/path weights generated by ATG are highly correlated with the manually assigned weights, and the ATG model significantly improves XML retrieval effectiveness.
  • Keywords
    XML retrieval , Topic generalization , Tag/path weighting model
  • Journal title
    Information Sciences
  • Serial Year
    2013
  • Journal title
    Information Sciences
  • Record number

    1215807