DocumentCode :
387565
Title :
Passage retrieval on Web data
Author :
Song, Rui-Hua ; Shao-Ping Ma ; Zhang, Min
Author_Institution :
State Key Lab. of Intelligent Tech. & Syst., Tsinghua Univ., Beijing, China
Volume :
3
fYear :
2002
fDate :
2002
Firstpage :
1437
Abstract :
On the Web, it is quite common that one document has several independent subtopics, i.e., it is multi-topic. For such document, dividing it into passages with each of them corresponding to only one topic will improve the retrieval performance. In this paper, the features embedded in the HTML structure are utilized as evidence of passage segmentation. Experimental results on the TREC-9 10 gigabyte Web dataset show that the 11-point average precision of the passage retrieval is higher than that of the usual document retrieval by about 9% on the collection of multi-topic documents and by about 1.6% on the whole document set. Further analyses indicate that the precision is actually higher, if all the documents returned by passage retrieval are assessed.
Keywords :
Internet; feature extraction; hypermedia markup languages; information retrieval; HTML structure; Web data; features selection; information retrieval; multiple topic document retrieval; passage retrieval; passage segmentation; HTML; Hidden Markov models; Information retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1167444
Filename :
1167444
Link To Document :
بازگشت