Title :
Evaluating the effects of textual features on authorship attribution accuracy
Author :
Ramezani, Reza ; Sheydaei, Navid ; Kahani, Mohsen
Author_Institution :
Dept. of Comput. Eng., Ferdowsi Univ. of Mashhad, Mashhad, Iran
fDate :
Oct. 31 2013-Nov. 1 2013
Abstract :
Authorship attribution (AA) or author identification refers to the problem of identifying the author of an unseen text. From the machine learning point of view, AA can be viewed as a multiclass, single-label text-categorization task. This task is based on this assumption that the author of an unseen text can be discriminated by comparing some textual features extracted from that unseen text with those of texts with known authors. In this paper the effects of 29 different textual features on the accuracy of author identification on Persian corpora in 30 different scenarios are evaluated. Several classification algorithms have been used on corpora with 2, 5, 10, 20 and 40 different authors and a comparison is performed. The evaluation results show that the information about the used words and verbs are the most reliable criteria for AA tasks and also NLP based features are more reliable than BOW based features.
Keywords :
feature extraction; natural language processing; pattern classification; text analysis; Persian corpora; author identification; authorship attribution accuracy; classification algorithms; textual features evaluation; Accuracy; Classification algorithms; Feature extraction; Mood; Reliability; Support vector machines; Author Identification; Authorship Attribution; Classification; Data Mining; Persian Corpus; Textual Features;
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-2092-1
DOI :
10.1109/ICCKE.2013.6682828