DocumentCode :
679955
Title :
Authorship detection of SMS messages using unigrams
Author :
Ragel, Roshan ; Herath, P. ; Senanayake, Upul
Author_Institution :
Dept. of Comput. Eng., Univ. of Peradeniya, Peradeniya, Sri Lanka
fYear :
2013
fDate :
17-20 Dec. 2013
Firstpage :
387
Lastpage :
392
Abstract :
SMS messaging is a popular media of communication. Because of its popularity and privacy, it could be used for many illegal purposes. Additionally, since they are part of the day to day life, SMSes can be used as evidence for many legal disputes. Since a cellular phone might be accessible to people close to the owner, it is important to establish the fact that the sender of the message is indeed the owner of the phone. For this purpose, the straight forward solutions seem to be the use of popular stylometric methods. However, in comparison with the data used for stylometry in the literature, SMSes have unusual characteristics making it hard or impossible to apply these methods in a conventional way. Our target is to come up with a method of authorship detection of SMS messages that could still give a usable accuracy. We argue that, considering the methods of author attribution, the best method that could be applied to SMS messages is an n-gram method. To prove our point, we checked two different methods of distribution comparison with varying number of training and testing data. We specifically try to compare how well our algorithms work under less amount of testing data and large number of candidate authors (which we believe to be the real world scenario) against controlled tests with less number of authors and selected SMSes with large number of words. To counter the lack of information in an SMS message, we propose the method of stacking together few SMSes.
Keywords :
cellular radio; electronic messaging; SMS messages; SMS messaging; author attribution; authorship detection; candidate authors; cellular phone; n-gram method; popular media; popular stylometric methods; stacking; straight forward solutions; stylometry; unigrams; Accuracy; Databases; Measurement; Pragmatics; Testing; Training; Vectors; SMS messaging; author attributing; stylometry unigrams;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial and Information Systems (ICIIS), 2013 8th IEEE International Conference on
Conference_Location :
Peradeniya
Print_ISBN :
978-1-4799-0908-7
Type :
conf
DOI :
10.1109/ICIInfS.2013.6732015
Filename :
6732015
Link To Document :
بازگشت