• DocumentCode
    457415
  • Title

    On Authorship Attribution via Markov Chains and Sequence Kernels

  • Author

    Sanderson, Conrad ; Guenter, Simon

  • Author_Institution
    Australian Nat. Univ., Canberra, ACT
  • Volume
    3
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    437
  • Lastpage
    440
  • Abstract
    We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance
  • Keywords
    Markov processes; pattern classification; probability; Markov chains; Moffat smoothing; authorship attribution; character sequence kernels; probabilistic approaches; word sequence kernels; Australia Council; Books; Forensics; Kernel; Machine learning; Plagiarism; Smoothing methods; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2521-0
  • Type

    conf

  • DOI
    10.1109/ICPR.2006.899
  • Filename
    1699558