• DocumentCode
    3319602
  • Title

    Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment

  • Author

    Hassan, Sundus ; Rafi, Muhammad ; Shaikh, Muhammad Shahid

  • Author_Institution
    Comput. Sci. Dept., NUCES-FAST, Karachi, Pakistan
  • fYear
    2011
  • fDate
    22-24 Dec. 2011
  • Firstpage
    31
  • Lastpage
    34
  • Abstract
    The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowledge from Wikitology and evaluating the experiment on Support Vector Machine with 10- fold cross-validations. The results clearly indicate Wikitology is far better than other knowledge bases. In this paper we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers under text enrichment through Wikitology. We validated results with 10-fold cross validation and shown that NB gives an improvement of +28.78%, on the other hand SVM gives an improvement of +636% when compared with baseline results. Naïve Bayes classifier is better choice when external enriching is used through any external knowledge base.
  • Keywords
    Bayes methods; Web sites; classification; knowledge acquisition; knowledge based systems; support vector machines; text analysis; NB classifiers; OPD; Open Project Directory; SVM; Wikipedia; Wikitology; Word Net; background knowledge; documents labelling; external knowledge base; knowledge bases; knowledge enrichment; knowledge extraction; knowledge repository; naïve Bayes classifiers; support vector machine; text categorization enhancement; text enrichment; Electronic publishing; Encyclopedias; Internet; Niobium; Support vector machines; 20 News Group; Knowledge base; Machine Learning; Naïve Bay; Support Vector Machine; Text Categorization; Wikitology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multitopic Conference (INMIC), 2011 IEEE 14th International
  • Conference_Location
    Karachi
  • Print_ISBN
    978-1-4577-0654-7
  • Type

    conf

  • DOI
    10.1109/INMIC.2011.6151495
  • Filename
    6151495