• DocumentCode
    1909350
  • Title

    Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text

  • Author

    Caliskan, Aylin ; Greenstadt, Rachel

  • Author_Institution
    Dept. of Comput. Sci., Drexel Univ., Philadelphia, PA, USA
  • fYear
    2012
  • fDate
    19-21 Sept. 2012
  • Firstpage
    121
  • Lastpage
    125
  • Abstract
    In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.
  • Keywords
    language translation; text analysis; author identification; authorship attribution; feature set discovery; machine translation tools; stylometric features; text translation; translator attribution; translator-dependent features; translator-effect-heavy features; Accuracy; Computer science; Feature extraction; Google; Privacy; Semantics; Writing; anonymity; authorship attribution; machine learning; machine translation; privacy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on
  • Conference_Location
    Palermo
  • Print_ISBN
    978-1-4673-4433-3
  • Type

    conf

  • DOI
    10.1109/ICSC.2012.46
  • Filename
    6337093