DocumentCode
2910172
Title
Using genetic algorithms in word-vector optimisation
Author
Smith, Peter W H
Author_Institution
Dept. of Comput., City Univ. London, London, UK
fYear
2010
fDate
8-10 Sept. 2010
Firstpage
1
Lastpage
5
Abstract
Word vectors and sets of words are used in a wide range of text-based applications. Yet these word sets are often chosen on an ad hoc basis. In this study, we examine two text-based applications that use word sets and in both cases find that classification performance can be optimised using a fairly simple genetic algorithm. The first study is in authorship attribution, the second one is sentiment analysis and in both cases classification precision can be improved using a genetic algorithm. In authorship attribution, in recent years the trend has been towards ever larger word vectors. We suggest that this might be a counter-productive step as it can easily lead to inaccuracy caused by overfitting or vector-space sparsity (the curse of dimensionality). In sentiment analysis precision is the main issue as rates of greater than 80-85% are not easy to achieve.
Keywords
genetic algorithms; pattern classification; text analysis; word processing; authorship attribution; classification performance; classification precision; genetic algorithm; sentiment analysis; sentiment analysis precision; text based application; vector space sparsity; word set; word vector optimisation; word vectors; Accuracy; Classification algorithms; Euclidean distance; Frequency measurement; Optimization; Presses; Support vector machine classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence (UKCI), 2010 UK Workshop on
Conference_Location
Colchester
Print_ISBN
978-1-4244-8774-5
Electronic_ISBN
978-1-4244-8773-8
Type
conf
DOI
10.1109/UKCI.2010.5625589
Filename
5625589
Link To Document