Title :
Optimal Estimation of Rejection Thresholds for Topic Spotting
Author :
Subramanian, Kartick ; Prasad, Ranga ; Natarajan, Prem ; Schwartz, R.
Author_Institution :
BBN Technol., Cambridge, MA, USA
Abstract :
In many applications of topic spotting technology, especially those that require a human review of in-topic documents, a low false alarm rate is a key requirement. Topic spotting techniques typically include a rejection scheme to filter out off-topic documents. In this paper we present a robust methodology for rejecting off-topic messages that, in addition to modeling the topics of interest, uses a so-called alternate model for topics that are not included in the set of topics of interest. Specifically, we introduce two novel techniques for estimating topic-specific rejection thresholds - a parametric technique that can be viewed as transformation of topic-independent thresholds, and a nonparametric technique based on constrained optimization of false rejections subject to a pre-specified number of false acceptances. Our experiments on newsgroup messages demonstrate that when adequate training data is available topic-specific threshold estimation techniques can outperform topic-independent thresholds in terms of the ROC curve.
Keywords :
document handling; pattern classification; false acceptances; false rejections subject; in-topic documents; nonparametric technique; off-topic documents; rejection thresholds; topic classification; topic spotting; topic-independent thresholds; topic-specific rejection thresholds; Broadcasting; Classification algorithms; Constraint optimization; Filters; Hidden Markov models; Humans; IP networks; Probability distribution; Robustness; Training data; Hidden Markov Models; Rejection Algorithms; Topic Classification;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367168