Title :
Categorizing the Document Using Multi Class Classification in Data Mining
Author :
Joshi, Shweta ; Nigam, Bhawna
Author_Institution :
Dept. of Comput. Eng., Devi Ahilya Vishwavidyalaya, Indore, India
Abstract :
Classification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of classifiers. But classification cannot be done easily when it is to be applied on text documents that is: document classification. The main purpose of this paper is to analyze the task multi-class document classification and to learn that how can we achieve high classification accuracy in the context of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model: assume all features are independent of one another, and compute the class of a document based on maximal probability. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective then Flat classification. It also performs better in case of multi-label document classification. The dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
Keywords :
Bayes methods; data mining; pattern classification; text analysis; UCI repository dataset; data mining; deceptively simplistic model; document categorization; flat classification technique; hierarchical classification technique; maximal probability; multiclass document classification; multilabel document classification; naive Bayes approach; text document; Accuracy; Conferences; Testing; Text categorization; Training; Vocabulary; Data Mining; Document Classification; Hierarchical Classification; Multi-class Classification Multi-label Classification; Naïve Bayes classifier; Text categorization;
Conference_Titel :
Computational Intelligence and Communication Networks (CICN), 2011 International Conference on
Conference_Location :
Gwalior
Print_ISBN :
978-1-4577-2033-8
DOI :
10.1109/CICN.2011.50