Title of article :
Information Filtering in TREC-9 and TDT-3: A Comparative Analysis
Author/Authors :
Yang، Yiming نويسنده , , Ault، Thomas Galen نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2002
Pages :
-158
From page :
159
To page :
0
Abstract :
Much work on automated information filtering has been done in the TREC and TDT domains, but differences in corpora, the nature of TREC topics vs. TDT events, the constraints imposed on training and testing, and the choices of performance measures confound any meaningful comparison between these domains. We attempt to bridge the gap between them by evaluating the performance of the k-nearest-neighbor (kNN) classification system on the corpus and categories from one domain using the constraints of the other. To maximize comparability and understand the effect of the evaluation metrics specific to each domain, we optimize the performance of kNN separately for the F1, T9P (preferred metric for TREC-9) and Ctrk (official metric for TDT-3) metrics. Through a thorough comparison of our within-domain and cross-domain results, our results demonstrate that the corpus used for TREC-9 is more challenging for an information filtering system than the TDT-3 corpus and strongly suggest that the TDT-3 event tracking task itself is more difficult than the TREC batch filtering task. We also show that optimizing performance in TREC-9 and TDT-3 tends to result in systems with different performance characteristics, confounding any meaningful comparison between the two domains, and that T9P and Ctrk both have properties that make them undesirable as general information filtering metrics.
Keywords :
information filtering , TREC , TDT , topic tracking
Journal title :
INFORMATION RETRIEVAL
Serial Year :
2002
Journal title :
INFORMATION RETRIEVAL
Record number :
89769
Link To Document :
بازگشت