• DocumentCode
    1856686
  • Title

    Feature selection in text categorization using the Baldwin effect

  • Author

    Yu, Edmund S. ; Liddy, Elizabeth D.

  • Author_Institution
    CIS, Syracuse Univ., NY, USA
  • Volume
    4
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    2924
  • Abstract
    Text categorization is the problem of automatically assigning predefined categories to natural language texts. A major difficulty of this problem stems from the high dimensionality of its feature space. Reducing the dimensionality, or selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In this paper, we propose a neuro-genetic approach to feature selection in text categorization. Candidate feature subsets are evaluated by using three-layer feedforward neural networks. The Baldwin effect concerns the tradeoffs between learning and evolution. It is used in our research to guide and improve the GA-based evolution of the feature subsets. Experimental results show that our neuro-genetic algorithm is able to perform as well as, if not better than, the best results of neural networks to date, while using fewer input features
  • Keywords
    feature extraction; feedforward neural nets; genetic algorithms; multilayer perceptrons; text analysis; Baldwin effect; GA-based evolution; dimensionality reduction; feature selection; feature subset selection; learning; natural language texts; neuro-genetic approach; text categorization; three-layer feedforward neural networks; Computational Intelligence Society; Data mining; Filters; Genetic algorithms; Humans; Indexing; Information retrieval; Natural languages; Neural networks; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 1999. IJCNN '99. International Joint Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-5529-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.1999.833550
  • Filename
    833550