• DocumentCode
    3096226
  • Title

    A hybrid algorithm for text classification based on rough set

  • Author

    Deng, Weibin

  • Author_Institution
    Key Lab. of Electron. Commerce & Modern Logistics, Chongqing Univ. of Posts & Telecommun., Chongqing, China
  • Volume
    1
  • fYear
    2011
  • fDate
    11-13 March 2011
  • Firstpage
    406
  • Lastpage
    410
  • Abstract
    Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification based on rough set is proposed in this paper. A set can be divided into positive region, negative region and boundary region by rough set. So, we can divide the documents into certain classes and doubt set using rough set firstly. In addition, based on the attributes´ importance degree theory in the informational view of rough set, the documents of the doubt set are classified further. We find that most of the documents can be classified with high accuracy in the first stage. Furthermore, the conditional independence assumption of naïve Bayes is relaxed to some extent in the second stage. Simulation results on general data sets comparing with naïve Byes, supported vector machine, and k-nearest neighbor illustrate the efficiency of this algorithm.
  • Keywords
    natural languages; pattern classification; rough set theory; hybrid algorithm; intelligent information processing; natural language; rough set theory; supported vector machine; text classification; Accuracy; Algorithm design and analysis; Classification algorithms; Feature extraction; Niobium; Support vector machines; Text categorization; KNN; SVM; rough set; text classification; weighted naïve Bayes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Research and Development (ICCRD), 2011 3rd International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-839-6
  • Type

    conf

  • DOI
    10.1109/ICCRD.2011.5764046
  • Filename
    5764046