DocumentCode :
2707429
Title :
Compression and stylometry for author identification
Author :
Pavelec, D. ; Oliveira, L.S. ; Justino, E. ; Neto, F. D Nobre ; Batista, L.V.
Author_Institution :
Pontifica Univ. Catolica do Parana, Curitiba, Brazil
fYear :
2009
fDate :
14-19 June 2009
Firstpage :
2445
Lastpage :
2450
Abstract :
In this paper we compare two different paradigms for author identification. The first one is based on compression algorithms where the entire process of defining and extracting features and training a classifier is avoided. The second paradigm, on the other hand, takes into account the classical pattern recognition framework, where linguistic features proposed by forensic experts are used to train a Support Vector Machine classifier. Comprehensive experiments performed on a database composed of 20 writers show that both strategies achieve similar performance but with an interesting degree of complementarity demonstrated through the confusion matrices. Advantages and drawback of both paradigms are also discussed.
Keywords :
classification; data compression; feature extraction; learning (artificial intelligence); support vector machines; author identification; compression algorithm; forensic expert; linguistic feature extraction; pattern recognition; stylometry; support vector machine classifier training; Compression algorithms; Data mining; Feature extraction; Forensics; Frequency; Neural networks; Pattern recognition; Spatial databases; Support vector machine classification; Support vector machines; Author identification; Compression; Stylometry;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2009. IJCNN 2009. International Joint Conference on
Conference_Location :
Atlanta, GA
ISSN :
1098-7576
Print_ISBN :
978-1-4244-3548-7
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2009.5178675
Filename :
5178675
Link To Document :
بازگشت