Title :
Toward a Robust data fusion for document retrieval
Author :
He, Daqing ; Wu, Dan
Author_Institution :
Sch. of Inf. Sci., Univ. of Pittsburgh, Pittsburgh, PA
Abstract :
This paper describes an investigation of signal boosting techniques for post-search data fusion, where the quality of the retrieval results involved in fusion may be low or diverse. The effectiveness of data fusion techniques in such situation depends on the ability of the fusion techniques to be able to boost the signals from relevant documents and reduce the effect of noise that often comes from low quality retrieval results. Our studies on Malach spoken document collection and HARD collection have demonstrated that CombMNZ, the most widely used data fusion method, does not have such ability. We, therefore, developed two versions of signal boosting mechanisms on top of CombMNZ, which result in two new fusion methods called WCombMNZ and WCombMWW. To examine the effectiveness of the two new methods, we conducted experiments on Malach and HARD document collections. Our results show that the new methods can significantly outperform CombMNZ in combining retrieval results that are low and diverse. When the tasks are to combine retrieval results that are in similar quality, which have been the scenarios that CombMNZ are applied often, the two new methods still can obtain often better, sometimes significantly, fusion results.
Keywords :
information retrieval; sensor fusion; HARD collection; Malach spoken document collection; WCombMNZ; WCombMWW; document retrieval; post-search data fusion; signal boosting techniques; Boosting; Diversity reception; Fusion power generation; Helium; Information management; Information resources; Information retrieval; Noise reduction; Robustness; Thesauri; CombMNZ; Data fusion; Malach; Spoken document retrieval; TREC HARD;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4515-8
Electronic_ISBN :
978-1-4244-2780-2
DOI :
10.1109/NLPKE.2008.4906754