Title of article :
Weighting tags and paths in XML documents according to their topic generalization
Author/Authors :
Dexi Liu، نويسنده , , Changxuan Wan، نويسنده , , Lei Chen، نويسنده , , Xiping Liu، نويسنده , , Jian-Yun Nie، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Pages :
19
From page :
48
To page :
66
Abstract :
Text-centric (or document-centric) XML document retrieval aims to rank search results according to their relevance to a given query. To do this, most existing methods mainly rely on content terms and often ignore an important factor – the XML tags and paths, which are useful in determining the important contents of a document. In some previous studies, each unique tag/path is assigned a weight based on domain (expert) knowledge. However, such a manual assignment is both inefficient and subjective. In this paper, we propose an automatic method to infer the weights of tags/paths according to the topical relationship between the corresponding elements and the whole documents. The more the corresponding element can generalize the document’s topic, the more the tag/path is considered to be important. We define a model based on Average Topic Generalization (ATG), which integrates several features used in previous studies. We evaluate the performance of the ATG-based model on two real data sets, the IEEECS collection and the Wikipedia collection, from two different perspectives: the correlation between the weights generated by ATG and those set by experts, and the performance of XML retrieval based on ATG. Experimental results show that the tag/path weights generated by ATG are highly correlated with the manually assigned weights, and the ATG model significantly improves XML retrieval effectiveness.
Keywords :
XML retrieval , Topic generalization , Tag/path weighting model
Journal title :
Information Sciences
Serial Year :
2013
Journal title :
Information Sciences
Record number :
1215807
Link To Document :
بازگشت