DocumentCode :
182100
Title :
An Efficient Intrinsic Authorship Verification Scheme Based on Ensemble Learning
Author :
Halvani, Oren ; Steinebach, Martin
Author_Institution :
Media Security & IT Forensics, Fraunhofer Inst. for Secure Inf. Technol., Darmstadt, Germany
fYear :
2014
fDate :
8-12 Sept. 2014
Firstpage :
571
Lastpage :
578
Abstract :
Authorship Verification is an important sub discipline of digital text forensics. Its goal is to decide, if two texts are written by the same author or not. We present an efficient Authorship Verification scheme based on an ensemble of K-Nearest Neighbor classifiers, where each classifier generates a decision regarding a feature category. Our scheme provides many benefits such as, for instance, the independence of linguistic resources like thesauruses or language models. Furthermore, it can handle different Indo-European languages as for instance English, German, Spanish, Greek, Dutch, Swedish or French. Another benefit is the low runtime, due to the fact that deep linguistic processing (tagging, chunking, parsing, etc.) is not taken into account. Moreover, our scheme can easily be modified for example by replacing the involved distance function, the acceptance criterion or the used features including their parameters. The proposed scheme is evaluated against the publicly available PAN-2013 Author Identification (AI) test corpus, where it was ranked as the second-best in the top ten list, as well as against five other test corpora, compiled by our own. We show in our experiments that it is possible to achieve promising results, even when using a fixed setting of parameters and features across seven different languages.
Keywords :
digital forensics; learning (artificial intelligence); natural language processing; pattern classification; text analysis; Dutch language; English language; French language; German language; Greek language; Indo-European languages; Spanish language; Swedish language; acceptance criterion; digital text forensics; distance function; ensemble learning; feature category; intrinsic authorship verification scheme; k-nearest neighbor classifiers; linguistic resource independence; publicly available PAN-2013 AI test corpus; publicly available PAN-2013 Author Identification test corpus; runtime analysis; Artificial intelligence; Feature extraction; Forensics; Noise reduction; Pragmatics; Training; Vectors; Authorship verification; cross-language learning; digital text forensics; one-class-classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Availability, Reliability and Security (ARES), 2014 Ninth International Conference on
Conference_Location :
Fribourg
Type :
conf
DOI :
10.1109/ARES.2014.84
Filename :
6980334
Link To Document :
بازگشت