DocumentCode :
3734255
Title :
Effect of different feature types on age based classification of short texts
Author :
Avar Pentel
Author_Institution :
Institute of Informatics, Tallinn University, Tallinn, Estonia
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
7
Abstract :
The aim of the current study is to compare the effect of three different feature types for age-based categorization of short texts as average 85 words per author. Besides widely used word and character n-grams, text readability features are proposed as an alternative. By readability features we mean different relative ratios of text elements as characters per word, words per sentence, etc. Support Vector Machines, Logistic Regression, and Bayesian algorithms were used to build models. Most effective features were readability features and character n-grams. Model generated by Support Vector Machine and combined feature set yield to f-score 0.968. Age prediction application was built using a model with readability features.
Keywords :
"Feature extraction","Support vector machines","Indexes","Classification algorithms","Training","Logistics"
Publisher :
ieee
Conference_Titel :
Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on
Type :
conf
DOI :
10.1109/IISA.2015.7388069
Filename :
7388069
Link To Document :
بازگشت