Abstract :
Various public information interactive processes, such as email, Instant Messaging (IM), Short Message Service (SMS), contain lots of advertising, obscene, illegal, and other spam information. Most of such spam information is text. From the computational linguistics perspective, textual information from different sources can be processed in a similar way. So the processing models or systems are expected to be portable on different information types. This paper introduces a unified spam filtering model for multi-source information, and proposes an approximate estimate method for the model portability. Based on the proposed model, a SVM has been used to classify the information. The experimental results show that the unified spam filtering model can be applied to multi-source information, and the SVM classification algorithm achieved encouraging performance.
Keywords :
e-mail filters; information filtering; pattern classification; support vector machines; unsolicited e-mail; SVM classification algorithm; email; instant messaging; multisource information; public information interactive processes; short message service; spam information; textual information; unimodel-based multi-source portable spam filtering; Advertising; Classification algorithms; Computational linguistics; Filtering algorithms; Information filtering; Information filters; Message service; Support vector machine classification; Support vector machines; Unsolicited electronic mail;