• DocumentCode
    617852
  • Title

    Information gain based dimensionality selection for classifying text documents

  • Author

    Wijayasekara, Dumidu ; Manic, Milos ; McQueen, Miles

  • Author_Institution
    Univ. of Idaho, Idaho Falls, ID, USA
  • fYear
    2013
  • fDate
    20-23 June 2013
  • Firstpage
    440
  • Lastpage
    445
  • Abstract
    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.
  • Keywords
    computational complexity; data mining; genetic algorithms; pattern classification; probability; text analysis; chromosomes mutation probability; classification applications; computational complexity reduction; data mining; genetic algorithm based dimensionality selection; genetic algorithm based methodology; information gain based dimensionality selection; knowledge extraction applications; text document classification; text mining applications; Genetic algorithms; Dimensionality Selection; Genetic Algorithms; Information Gain; Text mining; Vulnerability Discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2013 IEEE Congress on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4799-0453-2
  • Electronic_ISBN
    978-1-4799-0452-5
  • Type

    conf

  • DOI
    10.1109/CEC.2013.6557602
  • Filename
    6557602