• Title of article

    Authorship identification from unstructured texts: A stylometric approach

  • Author/Authors

    Ameri, Reyhaneh Department of Computer Engineering - Sharif University of Technology - Tehran, Iran , Beigy, Hamid Department of Computer Engineering - Sharif University of Technology - Tehran, Iran

  • Pages
    12
  • From page
    34
  • To page
    45
  • Abstract
    With the increasing use of the Internet, a considerable volume of texts is exchanged in cyberspace in which individuals can hide their true identities. Abuses that may occur in online communities due to unknown identities reduce the confidence of cyberspace and create many challenges. Hence the importance of maintaining the security of the space by controlling the user-generated content and identifying the authors of documents increases day by day. Author Identification is a method of finding the author of the anonymous document. Since there would not be any standard corpus for the Persian language, we created a standard Persian corpus for the authorship analysis applications in this language. In this paper, we propose an approach based on modeling the authors' writing style with the extracted stylometric features from their writing documents. Performance of author identification is also improved by applying pre-processing of the documents and reducing the dimensionality of the feature space by selecting the features with higher discriminative capability. The proposed approach is evaluated in terms of performance measures in data mining by designing and conducting experiments on the benchmark datasets of standard documents in Persian and English languages. The effect of different factors on the accuracy of the author's identification has also been investigated by designing and performing experiments. The results of these experiments have shown that the proposed method has a higher performance than the related state-of-the-art methods.
  • Keywords
    stylometric , Authorship identification , feature selection , classification method , writing styles
  • Journal title
    The CSI Journal on Computer Science and Engineering (JCSE)
  • Serial Year
    2020
  • Record number

    2629263