• DocumentCode
    2910172
  • Title

    Using genetic algorithms in word-vector optimisation

  • Author

    Smith, Peter W H

  • Author_Institution
    Dept. of Comput., City Univ. London, London, UK
  • fYear
    2010
  • fDate
    8-10 Sept. 2010
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Word vectors and sets of words are used in a wide range of text-based applications. Yet these word sets are often chosen on an ad hoc basis. In this study, we examine two text-based applications that use word sets and in both cases find that classification performance can be optimised using a fairly simple genetic algorithm. The first study is in authorship attribution, the second one is sentiment analysis and in both cases classification precision can be improved using a genetic algorithm. In authorship attribution, in recent years the trend has been towards ever larger word vectors. We suggest that this might be a counter-productive step as it can easily lead to inaccuracy caused by overfitting or vector-space sparsity (the curse of dimensionality). In sentiment analysis precision is the main issue as rates of greater than 80-85% are not easy to achieve.
  • Keywords
    genetic algorithms; pattern classification; text analysis; word processing; authorship attribution; classification performance; classification precision; genetic algorithm; sentiment analysis; sentiment analysis precision; text based application; vector space sparsity; word set; word vector optimisation; word vectors; Accuracy; Classification algorithms; Euclidean distance; Frequency measurement; Optimization; Presses; Support vector machine classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence (UKCI), 2010 UK Workshop on
  • Conference_Location
    Colchester
  • Print_ISBN
    978-1-4244-8774-5
  • Electronic_ISBN
    978-1-4244-8773-8
  • Type

    conf

  • DOI
    10.1109/UKCI.2010.5625589
  • Filename
    5625589