• DocumentCode
    2020271
  • Title

    Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features

  • Author

    DAI, Liuling ; HU, Jinwu ; Liu, WanChun

  • Author_Institution
    Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing
  • Volume
    1
  • fYear
    2008
  • fDate
    17-18 Oct. 2008
  • Firstpage
    182
  • Lastpage
    185
  • Abstract
    Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.
  • Keywords
    feature extraction; rough set theory; text analysis; CHI square feature selection; redundant features; rough set; rough set theory; text categorization; text mining; Competitive intelligence; Computational intelligence; Computer science; Information retrieval; Information technology; Laboratories; Machine learning algorithms; Partial response channels; Support vector machines; Text categorization; SVM; feature selection; rough set; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3311-7
  • Type

    conf

  • DOI
    10.1109/ISCID.2008.178
  • Filename
    4725586