Title of article
A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques
Author/Authors
Rong Zheng، نويسنده , , Jiexun Li، نويسنده , , Hsinchun Chen، نويسنده , , Zan Huang، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2006
Pages
16
From page
378
To page
393
Abstract
With the rapid proliferation of Internet technologies and
applications, misuse of online messages for inappropriate
or illegal purposes has become a major concern for
society. The anonymous nature of online-message distribution
makes identity tracing a critical problem. We
developed a framework for authorship identification of
online messages to address the identity-tracing problem.
In this framework, four types of writing-style features
(lexical, syntactic, structural, and content-specific
features) are extracted and inductive learning algorithms
are used to build feature-based classification models to
identify authorship of online messages. To examine this
framework, we conducted experiments on English and
Chinese online-newsgroup messages. We compared the
discriminating power of the four types of features and
of three classification techniques: decision trees, backpropagation
neural networks, and support vector
machines. The experimental results showed that the proposed
approach was able to identify authors of online
messages with satisfactory accuracy of 70 to 95%. All
four types of message features contributed to discriminating
authors of online messages. Support vector
machines outperformed the other two classification
techniques in our experiments. The high performance
we achieved for both the English and Chinese datasets
showed the potential of this approach in a multiplelanguage
context.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2006
Journal title
Journal of the American Society for Information Science and Technology
Record number
844074
Link To Document