• DocumentCode
    2769728
  • Title

    Categorizing the Document Using Multi Class Classification in Data Mining

  • Author

    Joshi, Shweta ; Nigam, Bhawna

  • Author_Institution
    Dept. of Comput. Eng., Devi Ahilya Vishwavidyalaya, Indore, India
  • fYear
    2011
  • fDate
    7-9 Oct. 2011
  • Firstpage
    251
  • Lastpage
    255
  • Abstract
    Classification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of classifiers. But classification cannot be done easily when it is to be applied on text documents that is: document classification. The main purpose of this paper is to analyze the task multi-class document classification and to learn that how can we achieve high classification accuracy in the context of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model: assume all features are independent of one another, and compute the class of a document based on maximal probability. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective then Flat classification. It also performs better in case of multi-label document classification. The dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
  • Keywords
    Bayes methods; data mining; pattern classification; text analysis; UCI repository dataset; data mining; deceptively simplistic model; document categorization; flat classification technique; hierarchical classification technique; maximal probability; multiclass document classification; multilabel document classification; naive Bayes approach; text document; Accuracy; Conferences; Testing; Text categorization; Training; Vocabulary; Data Mining; Document Classification; Hierarchical Classification; Multi-class Classification Multi-label Classification; Naïve Bayes classifier; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Communication Networks (CICN), 2011 International Conference on
  • Conference_Location
    Gwalior
  • Print_ISBN
    978-1-4577-2033-8
  • Type

    conf

  • DOI
    10.1109/CICN.2011.50
  • Filename
    6112865