DocumentCode :
670524
Title :
Authorship verification for short messages using stylometry
Author :
Brocardo, Marcelo Luiz ; Traore, Issa ; Saad, Shatina ; Woungang, Isaac
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Victoria - UVIC, Victoria, BC, Canada
fYear :
2013
fDate :
7-8 May 2013
Firstpage :
1
Lastpage :
6
Abstract :
Authorship verification can be checked using stylometric techniques through the analysis of linguistic styles and writing characteristics of the authors. Stylometry is a behavioral feature that a person exhibits during writing and can be extracted and used potentially to check the identity of the author of online documents. Although stylometric techniques can achieve high accuracy rates for long documents, it is still challenging to identify an author for short documents, in particular when dealing with large authors populations. These hurdles must be addressed for stylometry to be usable in checking authorship of online messages such as emails, text messages, or twitter feeds. In this paper, we pose some steps toward achieving that goal by proposing a supervised learning technique combined with n-gram analysis for authorship verification in short texts. Experimental evaluation based on the Enron email dataset involving 87 authors yields very promising results consisting of an Equal Error Rate (EER) of 14.35% for message blocks of 500 characters.
Keywords :
authorisation; data mining; digital forensics; electronic mail; electronic messaging; human computer interaction; learning (artificial intelligence); social networking (online); text analysis; Enron email dataset; Twitter feeds; access control; author identity checking; authorship verification; emails; equal error rate; linguistic style analysis; n-gram analysis; online documents; short messages; stylometric techniques; stylometry; supervised learning technique; text messages; text mining; writing characteristic analysis; Accuracy; Electronic mail; Error analysis; Feature extraction; Support vector machines; Training; Training data; Authentication and access control; authorship verification; biometrics systems; classification; n-gram features; short message verification; stylometry; text mining; writeprint;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer, Information and Telecommunication Systems (CITS), 2013 International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4799-0166-1
Type :
conf
DOI :
10.1109/CITS.2013.6705711
Filename :
6705711
Link To Document :
بازگشت