Title :
Authorship attribution of text samples using neural networks and Bayesian classifiers
Author_Institution :
Dept. of Comput. Sci., Central Connecticut State Univ., New Britain, CT, USA
Abstract :
Previous work has shown that statistics of letter pairs extracted from text samples can be effective in discriminating between two authors writing in a similar style. This paper extends that work by using n-tuples for n from 1 to 5. The features used in classification are the relative frequencies of the tuples, transformed with a KL transform. Both three layer neural network classifiers and Bayesian classifiers are used with these features to classify text samples from two similar authors. The most effective combination was 2-tuples used with a neural network classifier, although other combinations did nearly as well
Keywords :
Bayes methods; document handling; feature extraction; feedforward neural nets; pattern classification; statistical analysis; Bayesian classifiers; KL transform; authorship attribution; classification; feature extraction; multilayer neural network classifiers; text samples; tuples; writing style; Bayesian methods; Computer science; Concatenated codes; Displays; Frequency; Karhunen-Loeve transforms; Neural networks; Statistics; Testing; Writing;
Conference_Titel :
Systems, Man, and Cybernetics, 1994. Humans, Information and Technology., 1994 IEEE International Conference on
Conference_Location :
San Antonio, TX
Print_ISBN :
0-7803-2129-4
DOI :
10.1109/ICSMC.1994.400086