Title :
Effect of different feature types on age based classification of short texts
Author_Institution :
Institute of Informatics, Tallinn University, Tallinn, Estonia
fDate :
7/1/2015 12:00:00 AM
Abstract :
The aim of the current study is to compare the effect of three different feature types for age-based categorization of short texts as average 85 words per author. Besides widely used word and character n-grams, text readability features are proposed as an alternative. By readability features we mean different relative ratios of text elements as characters per word, words per sentence, etc. Support Vector Machines, Logistic Regression, and Bayesian algorithms were used to build models. Most effective features were readability features and character n-grams. Model generated by Support Vector Machine and combined feature set yield to f-score 0.968. Age prediction application was built using a model with readability features.
Keywords :
"Feature extraction","Support vector machines","Indexes","Classification algorithms","Training","Logistics"
Conference_Titel :
Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on
DOI :
10.1109/IISA.2015.7388069