Title :
Density estimation-based document categorization using von Mises-Fisher kernels
Author :
Skabar, Andrew ; Memon, Saud A.
Author_Institution :
Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia
Abstract :
Although classifiers such as Support Vector Machines (SVMs) and k Nearest Neighbors (kNN) are able to achieve excellent document classification performance as measured by common information retrieval measures such as F1 score and breakeven point, they do not produce output values that can reliably be interpreted as probabilities. This means that without additional post-processing of their outputs, these classifiers are not capable of answering a question as simple as “To which classes does a document belong with a minimum probability of 80%?” This paper presents a density estimation-based classification technique which outputs probabilities directly. The technique is based on estimating densities using von Mises-Fisher kernels, and combining these under Bayes´ Theorem to arrive at posterior probabilities of class membership. Results of applying the technique to the Reuters-21578 dataset show that the technique is computationally feasible, that its classification performance as measured by F1 score is comparable to that of SVMs and better than that of kNN classifiers, and that the output values can be interpreted as well-calibrated probabilities.
Keywords :
Bayes methods; document handling; information retrieval; pattern classification; probability; support vector machines; Bayes theorem; F1 score; Reuters-21578 dataset; breakeven point; class membership; density estimation; document categorization; document classification; information retrieval measures; k nearest neighbors; posterior probability; support vector machines; von Mises-Fisher kernels; Equations; Kernel; Mathematical model; Nearest neighbor searches; Reliability; Training; Training data;
Conference_Titel :
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6916-1
DOI :
10.1109/IJCNN.2010.5596595