Title of article :
Automating Survey Coding by Multiclass Text Categorization Techniques
Author/Authors :
Giorgetti، Daniela نويسنده , , Sebastiani، Fabrizio نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2003
Pages :
-1268
From page :
1269
To page :
0
Abstract :
In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callumʹs RAINBOW package and the multi-class support vector machine learner from Hsu and Linʹs BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.
Keywords :
Paracetamol , Drug , ozonation , Advanced oxidation process , Hydrogen peroxide UV photolysis
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2003
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
35109
Link To Document :
بازگشت