Abstract :
Abstract—e-mail has become an important means of
electronic communication but the viability of its usage is
marred by Un-solicited Bulk e-mail (UBE) messages. UBE
consists of many types like pornographic, virus infected
messages, ‘cry-for-help’ messages as well as fake and
fraudulent offers/advertisements/promotions of stocks and
shares, jobs, winnings, and medicines. UBE poses technical
and socio-economic challenges to usage of e-mails. To meet
this challenge and combat this menace, we need to
understand UBE. Towards this end, a content-based textual
analysis of more than 3100 stocks and shares-advertising unstructured
UBE documents is presented. The paper is aimed
at polarity determination of such UBE through its sentiment
analysis. Technically, this is an application of Opinion
Mining approached with help of Text Parsing, Tokenization,
BOW and VSDM techniques. Sentiment Analysis is used to
determine the polarity of the document because such UBE
contain opinion of the spammer about specific stock symbol
of share market. The Sentiment-depicting words are
analyzed in the UBE corpus, scaled and extremes of positive
and negative opinions are identified. An attempt has been
made for polarity-based distribution of such UBE. It has
been found that for almost 50% of cases, the opinions
expressed through such UBE have positive polarity, almost
30% cases are negatively opined whereas almost 20% cases
contain neutral opinion. To the best knowledge and based on
review of related literature, determination of UBE polarity
using Opinion Mining for understanding spammer behaviour
is a new concept.