Title of article :
Automatic categorization of fanatic texts using random forests
Author/Authors :
KLEMA, JIRI CTU Prague - Department of Cybernetics, Czech Republic , ALMONAYYES, AHMAD Kuwait University - Dept of Mathematics Computer Science, Kuwait
From page :
1
To page :
18
Abstract :
This paper presents a study of the task of classification and analysis of fanatic texts. The analyzed set of texts stems from an Arabic environment in Kuwait, where teachers and students were asked questions regarding various terrorist tendencies. The responses were assigned by a domain expert into one of three classes with respect to degree of fanaticism of their content. The main task was to grasp the implicit expert’s knowledge and distinguish the documents according to their content. The paper deals with the bag-of-words representation of the documents. It applies learning algorithms that proved to work well in the field of text classification (TFIDF classifier, multinomial probabilistic model) as well as the random forest classifier that is well-known to cope with domains described by a large number of features. The associated task is to discover any knowledge helping to understand the domain. For this reason, the final models were also analyzed and used to reveal inherent structure inside the set of documents (a sub-class structure) or to identify important words and their possible relations
Keywords :
Data Mining , Machine Learning , Random Forest , Text classification
Journal title :
Kuwait Journal of Science
Journal title :
Kuwait Journal of Science
Record number :
2573237
Link To Document :
بازگشت