Title of article :
Stylometric analyses using Dirichlet process mixture models
Author/Authors :
Gill، نويسنده , , Paramjit S. and Swartz، نويسنده , , Tim B.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
10
From page :
3665
To page :
3674
Abstract :
Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers.
Keywords :
Clustering , Dirichlet process priors , Multinomial distribution , Federalist papers , Bayesian methods , disputed authorship , Computational Linguistics
Journal title :
Journal of Statistical Planning and Inference
Serial Year :
2011
Journal title :
Journal of Statistical Planning and Inference
Record number :
2221646
Link To Document :
بازگشت