• DocumentCode
    3059718
  • Title

    Naïve Bayes text classification with positive features selected by statistical method

  • Author

    Meena, M. Janaki ; Chandran, K.R.

  • Author_Institution
    Dept. of CSE, PSG Coll. of Technol., Coimbatore, India
  • fYear
    2009
  • fDate
    13-15 Dec. 2009
  • Firstpage
    28
  • Lastpage
    33
  • Abstract
    Text classification is enduring to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Naive Bayes is an effective and a simple classifier for data mining tasks, but does not show much satisfactory results in automatic text classification problems. In this paper, the performance of naive Bayes classifier is analyzed by training the classifier with only the positive features selected by CHIR, a statistics based method as input. Feature selection is the most important preprocessing step that improves the efficiency and accuracy of text classification algorithms by removing redundant and irrelevant terms from the training corpus. Experiments were conducted for randomly selected training sets and the performance of the classifier with words as features was analyzed. The proposed method achieves higher classification accuracy compared to other native methods for the 20Newsgroup benchmark.
  • Keywords
    Bayes methods; data mining; statistical analysis; text analysis; CHIR; data mining; digital data; electronic documents; naive Bayes text classification; positive feature selection; statistical method; Bayesian methods; Data mining; Educational institutions; Information filtering; Information filters; Performance analysis; Statistical analysis; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing, 2009. ICAC 2009. First International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4244-4786-2
  • Electronic_ISBN
    978-1-4244-4787-9
  • Type

    conf

  • DOI
    10.1109/ICADVC.2009.5378273
  • Filename
    5378273