DocumentCode
2491403
Title
Density estimation-based document categorization using von Mises-Fisher kernels
Author
Skabar, Andrew ; Memon, Saud A.
Author_Institution
Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
8
Abstract
Although classifiers such as Support Vector Machines (SVMs) and k Nearest Neighbors (kNN) are able to achieve excellent document classification performance as measured by common information retrieval measures such as F1 score and breakeven point, they do not produce output values that can reliably be interpreted as probabilities. This means that without additional post-processing of their outputs, these classifiers are not capable of answering a question as simple as “To which classes does a document belong with a minimum probability of 80%?” This paper presents a density estimation-based classification technique which outputs probabilities directly. The technique is based on estimating densities using von Mises-Fisher kernels, and combining these under Bayes´ Theorem to arrive at posterior probabilities of class membership. Results of applying the technique to the Reuters-21578 dataset show that the technique is computationally feasible, that its classification performance as measured by F1 score is comparable to that of SVMs and better than that of kNN classifiers, and that the output values can be interpreted as well-calibrated probabilities.
Keywords
Bayes methods; document handling; information retrieval; pattern classification; probability; support vector machines; Bayes theorem; F1 score; Reuters-21578 dataset; breakeven point; class membership; density estimation; document categorization; document classification; information retrieval measures; k nearest neighbors; posterior probability; support vector machines; von Mises-Fisher kernels; Equations; Kernel; Mathematical model; Nearest neighbor searches; Reliability; Training; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location
Barcelona
ISSN
1098-7576
Print_ISBN
978-1-4244-6916-1
Type
conf
DOI
10.1109/IJCNN.2010.5596595
Filename
5596595
Link To Document