Using genetic algorithms in word-vector optimisation

Author

Smith, Peter W H

Author_Institution

Dept. of Comput., City Univ. London, London, UK

fYear

2010

fDate

8-10 Sept. 2010

Firstpage

1

Lastpage

5

Abstract

Word vectors and sets of words are used in a wide range of text-based applications. Yet these word sets are often chosen on an ad hoc basis. In this study, we examine two text-based applications that use word sets and in both cases find that classification performance can be optimised using a fairly simple genetic algorithm. The first study is in authorship attribution, the second one is sentiment analysis and in both cases classification precision can be improved using a genetic algorithm. In authorship attribution, in recent years the trend has been towards ever larger word vectors. We suggest that this might be a counter-productive step as it can easily lead to inaccuracy caused by overfitting or vector-space sparsity (the curse of dimensionality). In sentiment analysis precision is the main issue as rates of greater than 80-85% are not easy to achieve.

Keywords

genetic algorithms; pattern classification; text analysis; word processing; authorship attribution; classification performance; classification precision; genetic algorithm; sentiment analysis; sentiment analysis precision; text based application; vector space sparsity; word set; word vector optimisation; word vectors; Accuracy; Classification algorithms; Euclidean distance; Frequency measurement; Optimization; Presses; Support vector machine classification;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence (UKCI), 2010 UK Workshop on

Conference_Location

Colchester

Print_ISBN

978-1-4244-8774-5

Electronic_ISBN

978-1-4244-8773-8

Type

conf

DOI

10.1109/UKCI.2010.5625589

Filename

5625589