Title :
An investigation of the TREC Web track datasets based on the hyperlink analysis algorithm
Author :
Liu, W.E. ; Zhang, Gang
Author_Institution :
Software Dept., Chinese Acad. of Sci., Beijing, China
Abstract :
One of the main aims of TREC (text retrieval conference) Web track has been to answer the question if link-based methods are better than keyword-based methods for Web search, but most of the participations including us find that the hyperlink structure cannot improve search effectiveness as some commercial search engine claimed. This paper tries to find the reason by investigating the WT10G, the .GOV dataset, the answer-sets and the TREC evaluation measure. We propose our assumption about the link-based methods and prove its correction in these two datasets. How to get a better result by the link-based methods was found in TREC datasets by some experiments. Some suggestions for the TREC datasets collection and evaluation measure are also given this paper.
Keywords :
Web sites; information retrieval; search engines; text analysis; .GOV dataset; WT10G dataset; Web search; Web track dataset; hyperlink analysis algorithm; keyword-based method; link-based method; search engine; text retrieval conference; Algorithm design and analysis; Benchmark testing; Information filtering; Information retrieval; Internet; NIST; Navigation; Search engines; Software algorithms; Web search;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1264522