Author/Authors :
Gheorghe Muresan، نويسنده , , David J. Harper، نويسنده ,
Abstract :
Clear and precise queries are a necessity when searching
very large document collections, especially when
query-based retrieval is the only means of exploration.
We propose system-mediated information access as a
solution for users’ well-documented inability to formulate
good queries. Our approach is based on two main
assumptions: first, on the ability of document clustering
to reveal the topical, semantic structure of a problem
domain represented by a specialized “source collection,”
and, second, on the capacity of statistical language
models to convey content. Taking the role of the
human mediator or intermediary searcher, a mediation
system interacts with the user and supports her exploration
of a relatively small source collection, chosen to
be representative for the problem domain. Based on the
user’s selection of relevant “exemplary” documents and
clusters from this source collection, the system builds a
language model of her information need. This model is
subsequently used to derive “mediated queries,” which
are expected to convey precisely and comprehensively
the user’s information need, and can be submitted by the
user to search any large and heterogeneous “target collections.”
We present results of experiments that simulated
various mediation strategies and compared the
effect on mediation effectiveness of a variety of parameters,
such as the similarity measure, the weighting
scheme, and the clustering method. They provide both
upperbounds of performance that can potentially be
reached by real end users and a comparison between
the effectiveness of these strategies. The experimental
evidence suggests that information retrieval mediated
through a clustered specialized collection has potential
to improve effectiveness significantly.