Title of article :
Improving probabilistic information retrieval by modeling burstiness of words
Author/Authors :
Zuobing Xu، نويسنده , , Ram Akella، نويسنده ,
Issue Information :
دوماهنامه با شماره پیاپی سال 2010
Pages :
16
From page :
143
To page :
158
Abstract :
The classical probabilistic models attempt to capture the ad hoc information retrieval problem within a rigorous probabilistic framework. It has long been recognized that the primary obstacle to the effective performance of the probabilistic models is the need to estimate a relevance model. The Dirichlet compound multinomial (DCM) distribution based on the Polya Urn scheme, which can also be considered as a hierarchical Bayesian model, is a more appropriate generative model than the traditional multinomial distribution for text documents. We explore a new probabilistic model based on the DCM distribution, which enables efficient retrieval and accurate ranking. Because the DCM distribution captures the dependency of repetitive word occurrences, the new probabilistic model based on this distribution is able to model the concavity of the score function more effectively. To avoid the empirical tuning of retrieval parameters, we design several parameter estimation algorithms to automatically set model parameters. Additionally, we propose a pseudo-relevance feedback algorithm based on the mixture modeling of the Dirichlet compound multinomial distribution to further improve retrieval accuracy. Finally, our experiments show that both the baseline probabilistic retrieval algorithm based on the DCM distribution and the corresponding pseudo-relevance feedback algorithm outperform the existing language modeling systems on several TREC retrieval tasks. The main objective of this research is to develop an effective probabilistic model based on the DCM distribution. A secondary objective is to provide a thorough understanding of the probabilistic retrieval model by a theoretical understanding of various text distribution assumptions.
Keywords :
Probabilistic retrieval model (PRM) , Dirichlet distribution , Language model (LM)
Journal title :
Information Processing and Management
Serial Year :
2010
Journal title :
Information Processing and Management
Record number :
1229015
Link To Document :
بازگشت