Title :
A computational linguistic approach for the identification of translator stylometry using Arabic-English text
Author :
El-Fiqi, Heba ; Petraki, Eleni ; Abbass, Hussein A.
Author_Institution :
Sch. of Eng. & Inf. Technol., Univ. of New South Wales, Canberra, ACT, Australia
Abstract :
Translator Stylometry is a small but growing area of research in computational linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient literature on the topic. Some authors even claimed that this problem does not have a solution; a claim we will challenge in this paper. We present an innovative set of translator stylometric features that can be used as signatures to detect and identify translators. The features are based on the concept of network motifs: small graph local substructures which have been used successfully in characterizing global network dynamics. The text is transformed into a network, where words become nodes and their adjacencies in a sentence are represented through links. Motifs of size 3 are then extracted from this network and their distribution is used as a signature for the corresponding translator. We then investigate the impact of sample size, method of normalization and imbalance dataset on classification accuracy. We also adopt the Fuzzy Lattice Reasoning Classifier (FLR) among others, where FLR achieved the best performance with a classification accuracy reaching the 70% mark.
Keywords :
computational linguistics; fuzzy reasoning; language translation; natural language processing; pattern classification; text analysis; Arabic-English text; authorship attribution; classification accuracy; computational linguistics techniques; fuzzy lattice reasoning classifier; global network dynamics; graph local substructures; imbalance dataset; network motifs; normalization dataset; translator stylometry identification; translator stylometry problem; Accuracy; Computational linguistics; Feature extraction; Helium; Testing; Training; Vegetation; Arabic-English Corpus; Authorship Attributions; Computational linguistics; Decision Tree Analysis; Fuzzy Classifier; Network Motifs; Translator Stylometry;
Conference_Titel :
Fuzzy Systems (FUZZ), 2011 IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-7315-1
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZY.2011.6007535