• DocumentCode
    3055422
  • Title

    A feature selection algorithm with redundancy reduction for text classification

  • Author

    Saleh, Sherine Nagi ; El-Sonbaty, Yasser

  • Author_Institution
    Arab Acad. for Sci., Alexandria
  • fYear
    2007
  • fDate
    7-9 Nov. 2007
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of selected features and thus improves the classification accuracy. In this paper, a new algorithm for multi-label document classification is presented. This algorithm focuses on the reduction of redundant features using the concept of minimal redundancy maximal relevance which is based on the mutual information measure. The features selected by the proposed algorithm are then input to one of two classifiers, the multinomial naive Bayes classifier and the linear kernel support vector machines. The experimental results on the Reuters dataset show that the proposed algorithm is superior to some recent algorithms presented in the literature in many respects like the F1-measure and the break-even point.
  • Keywords
    Bayes methods; classification; feature extraction; support vector machines; text analysis; Reuters dataset; feature selection algorithm; kernel support vector machines; multilabel document classification; multinomial naive Bayes classifier; mutual information measure; redundancy reduction; text classification; Educational institutions; Frequency; Gain measurement; Kernel; Mutual information; Performance evaluation; Support vector machine classification; Support vector machines; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
  • Conference_Location
    Ankara
  • Print_ISBN
    978-1-4244-1363-8
  • Electronic_ISBN
    978-1-4244-1364-5
  • Type

    conf

  • DOI
    10.1109/ISCIS.2007.4456849
  • Filename
    4456849