Title :
Authorship attribution using function words adjacency networks
Author :
Segarra, Santiago ; Eisen, Mark ; Ribeiro, Alejandro
Author_Institution :
Dept. of Electr. & Syst. Eng., Univ. of Pennsylvania, Philadelphia, PA, USA
Abstract :
We present an authorship attribution method based on relational data between function words. These are content independent words that help define grammatical relationships. As relational structures we use normalized word adjacency networks. We interpret these networks as Markov chains and compare them using entropy measures. We illustrate the accuracy of the method developed through a series of numerical experiments including comparisons with frequency based methods. We show that accuracy increases when combining relational and frequency based data, indicating that both sources of information encode different aspects of authorial styles.
Keywords :
Markov processes; entropy; relational databases; text analysis; word processing; Markov chains; authorial styles; authorship attribution method; content independent words; entropy measures; frequency-based data; function words; grammatical relationships; information sources; normalized word adjacency networks; relational data; relational structures; Accuracy; Entropy; Frequency measurement; Markov processes; Pragmatics; Support vector machines; Wide area networks; Authorship attribution; Markov chain; relative entropy; word adjacency network;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638728