• DocumentCode
    2491403
  • Title

    Density estimation-based document categorization using von Mises-Fisher kernels

  • Author

    Skabar, Andrew ; Memon, Saud A.

  • Author_Institution
    Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Melbourne, VIC, Australia
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Although classifiers such as Support Vector Machines (SVMs) and k Nearest Neighbors (kNN) are able to achieve excellent document classification performance as measured by common information retrieval measures such as F1 score and breakeven point, they do not produce output values that can reliably be interpreted as probabilities. This means that without additional post-processing of their outputs, these classifiers are not capable of answering a question as simple as “To which classes does a document belong with a minimum probability of 80%?” This paper presents a density estimation-based classification technique which outputs probabilities directly. The technique is based on estimating densities using von Mises-Fisher kernels, and combining these under Bayes´ Theorem to arrive at posterior probabilities of class membership. Results of applying the technique to the Reuters-21578 dataset show that the technique is computationally feasible, that its classification performance as measured by F1 score is comparable to that of SVMs and better than that of kNN classifiers, and that the output values can be interpreted as well-calibrated probabilities.
  • Keywords
    Bayes methods; document handling; information retrieval; pattern classification; probability; support vector machines; Bayes theorem; F1 score; Reuters-21578 dataset; breakeven point; class membership; density estimation; document categorization; document classification; information retrieval measures; k nearest neighbors; posterior probability; support vector machines; von Mises-Fisher kernels; Equations; Kernel; Mathematical model; Nearest neighbor searches; Reliability; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2010 International Joint Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-6916-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2010.5596595
  • Filename
    5596595