• DocumentCode
    2804066
  • Title

    Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics

  • Author

    Li, Yanjun ; Hsu, D. Frank ; Chung, Soon M.

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Fordham Univ., Bronx, NY, USA
  • fYear
    2009
  • fDate
    2-4 Nov. 2009
  • Firstpage
    508
  • Lastpage
    517
  • Abstract
    Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus.Extensive researches have been done to improve the performance of individual feature selection methods, but not much on their combinations.In this paper, we propose a method of combining multiple feature selection methods by using the combinatorial fusion analysis (CFA). A rank-score function and its graph, called rank-score graph,are adopted to measure the diversity of different feature selection methods.We have shown that a combination of multiple feature selection methods can outperform a single method only if each individual feature selection method has unique scoring behavior and relatively high performance. Moreover, it is shown that the rank-score function and rank-score graph are useful for the selection of a combination of feature selection methods.
  • Keywords
    data mining; graph theory; text analysis; combinatorial fusion analysis; multiple feature selection methods; rank-score graph; text categorization; Artificial intelligence; Computer science; Diversity reception; Frequency estimation; Functional analysis; Information science; Mutual information; Text categorization; Text mining; USA Councils; Feature selection; combinatorial fusion analysis (CFA); rank combination; rank-score function; score combination; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on
  • Conference_Location
    Newark, NJ
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4244-5619-2
  • Electronic_ISBN
    1082-3409
  • Type

    conf

  • DOI
    10.1109/ICTAI.2009.129
  • Filename
    5362606