Title :
Compression and stylometry for author identification
Author :
Pavelec, D. ; Oliveira, L.S. ; Justino, E. ; Neto, F. D Nobre ; Batista, L.V.
Author_Institution :
Pontifica Univ. Catolica do Parana, Curitiba, Brazil
Abstract :
In this paper we compare two different paradigms for author identification. The first one is based on compression algorithms where the entire process of defining and extracting features and training a classifier is avoided. The second paradigm, on the other hand, takes into account the classical pattern recognition framework, where linguistic features proposed by forensic experts are used to train a Support Vector Machine classifier. Comprehensive experiments performed on a database composed of 20 writers show that both strategies achieve similar performance but with an interesting degree of complementarity demonstrated through the confusion matrices. Advantages and drawback of both paradigms are also discussed.
Keywords :
classification; data compression; feature extraction; learning (artificial intelligence); support vector machines; author identification; compression algorithm; forensic expert; linguistic feature extraction; pattern recognition; stylometry; support vector machine classifier training; Compression algorithms; Data mining; Feature extraction; Forensics; Frequency; Neural networks; Pattern recognition; Spatial databases; Support vector machine classification; Support vector machines; Author identification; Compression; Stylometry;
Conference_Titel :
Neural Networks, 2009. IJCNN 2009. International Joint Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-3548-7
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2009.5178675