• DocumentCode
    2334930
  • Title

    A new approach to feature selection in text classification

  • Author

    Wang, Yi ; Wang, Xiao-Jing

  • Author_Institution
    Chengdu Inst. of Comput. Appl., Chinese Acad. of Sci., Chengdu, China
  • Volume
    6
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    3814
  • Abstract
    Text classification is the process of automatically assigning predefined categories to free text, which is very important to information retrieval and many other applications. Of it, the first important thing is to effectively represent a text to characterize it as belonging to a specified category based on its content and thus make the following phase of classifier training and using more effective and efficient regarding to the final classification performance. In this paper, an effective and efficient new method called variance-mean based feature filtering method of feature selection to do feature reduction in the representation phase for text classification is proposed. It keeps the best features, and thus improves the final performance, e.g. macro-f1 to 0.92 and simultaneously decreases the computing time for representing the incoming text waiting to be classified dramatically, which is important because it occurs on line and is time-critical. The effectiveness and efficiency are especially obvious when applied to Chinese language text.
  • Keywords
    classification; feature extraction; learning (artificial intelligence); text analysis; Chinese language text; classifier training; feature selection; information retrieval; text classification; variance-mean based feature filtering method; Computer applications; Electronic mail; Indexing; Information filtering; Information filters; Information retrieval; Machine learning; Natural languages; Pattern recognition; Text categorization; Feature filtering; feature reduction; feature selection; text classification; text representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527604
  • Filename
    1527604