Title of article
An Analysis of the Relative Hardness of Reuters-21578 Subsets
Author/Authors
Franca Debole and Fabrizio Sebastiani، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2005
Pages
13
From page
584
To page
596
Abstract
The existence, public availability, and widespread acceptance
of a standard benchmark for a given information
retrieval (IR) task are beneficial to research on
this task, because they allow different researchers to
experimentally compare their own systems by comparing
the results they have obtained on this benchmark.
The Reuters-21578 test collection, together with
its earlier variants, has been such a standard benchmark
for the text categorization (TC) task throughout
the last 10 years.However , the benefits that this has
brought about have somehow been limited by the fact
that different researchers have “carved” different subsets
out of this collection and tested their systems on
one of these subsets only; systems that have been
tested on different Reuters-21578 subsets are thus not
readily comparable.In this article, we present a systematic,
comparative experimental study of the three
subsets of Reuters-21578 that have been most popular
among TC researchers.The results we obtain allow us
to determine the relative hardness of these subsets,
thus establishing an indirect means for comparing TC
systems that have, or will be, tested on these different
subsets.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2005
Journal title
Journal of the American Society for Information Science and Technology
Record number
843930
Link To Document