• DocumentCode
    525679
  • Title

    Complete Gini-Index Text (GIT) feature-selection algorithm for text classification

  • Author

    Park, Heum ; Kwon, Soonho ; Kwon, Hyuk-Chul

  • Author_Institution
    Dept. of Comput. Sci., Pusan Nat. Univ., Busan, South Korea
  • fYear
    2010
  • fDate
    23-25 June 2010
  • Firstpage
    366
  • Lastpage
    371
  • Abstract
    The recently introduced Gini-Index Text (GIT) feature-selection algorithm for text classification, through incorporating an improved Gini Index for better feature-selection performance, has some drawbacks. Specifically, the algorithm, under real-world experimental conditions, concentrates feature values to one point and be inadequate for selecting representative features. As such, good representative features cannot be estimated, and neither, moreover, can good performance be achieved in unbalanced text classification. Therefore, we suggest a new complete GIT feature-selection algorithm for text classification. The new algorithm, according to experimental results, could obtain unbiased feature values, and could eliminate many irrelevant and redundant features from feature subsets while retaining many representative features. Furthermore, the new algorithm, compared with the original version, demonstrated a notably improved overall classification performance.
  • Keywords
    pattern classification; text analysis; GIT feature-selection algorithm; Gini-Index text; feature subsets; representative features; text classification; unbiased feature values; Artificial intelligence; Classification algorithms; Computer science; Entropy; Information filtering; Information filters; Mutual information; Support vector machine classification; Support vector machines; Text categorization; Gini-Index; feature selection; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-7324-3
  • Electronic_ISBN
    978-89-88678-22-0
  • Type

    conf

  • Filename
    5542893