Title :
Genre Classification on German Novels
Author :
Lena Hettinger;Martin Becker;Isabella Reger;Fotis Jannidis;Andreas Hotho
Abstract :
The study of German literature is mostly based on literary canons, i.e., small sets of specifically chosen documents. In particular, the history of novels has been characterized using a set of only 100 to 250 works. In this paper we address the issue of genre classification in the context of a large set of novels using machine learning methods in order to achieve a better understanding of the genre of novels. To this end, we explore how different types of features affect the performance of different classification algorithms. We employ commonly used stylometric features, and evaluate two types of features not yet applied to genre classification, namely topic based features and features based on social network graphs and character interaction. We build features on a data set of close to 1700 novels either written in or translated into German. Even though topics are often considered orthogonal to genres, we find that topic based features in combination with support vector machines achieve the best results. Overall, we successfully apply new feature types for genre classification in the context of novels and give directions for further research in this area.
Keywords :
"Feature extraction","Social network services","Context","Web pages","Error analysis","Data mining","Electronic mail"
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2015 26th International Workshop on
Print_ISBN :
978-1-4673-7581-8
Electronic_ISBN :
2378-3915
DOI :
10.1109/DEXA.2015.62