DocumentCode
2872266
Title
A Categorization Algorithm for Harmful Text Information Filtering
Author
Juan Du ; Zhi an Yi
Author_Institution
Software Coll., Northeast Pet. Univ., Daqing, China
fYear
2012
fDate
2-4 Nov. 2012
Firstpage
31
Lastpage
34
Abstract
Harmful text information filtering is a typical pattern recognition problem of small sample, the prediction result of classifier was biased towards the class with more samples, because of the samples that including the harmful information were difficult to gain. Construct virtual samples is an effective means to solve the problem of pattern recognition in the small sample, using the up-sampling method to construct virtual samples in the data layer, the traditional KNN algorithm has been improved: a small sample set is divided into clusters by using the K-means clustering, the virtual samples are generated and verified the validity in the cluster. The experimental results show that this method can construct the virtual samples which are similar to the real sample characteristics, and expand the small sample collection in order to effectively identify the harmful text information.
Keywords
information filtering; pattern classification; pattern clustering; sampling methods; text analysis; K-means clustering; categorization algorithm; classifier prediction result; data layer; harmful text information filtering; improved KNN algorithm; pattern recognition problem; real sample characteristics; up-sampling method; virtual sample generation; Classification algorithms; Clustering algorithms; Genetic algorithms; Genetics; Information filtering; Support vector machine classification; Training; Harmful information filtering; Network information security; Small sample pattern recognition; Virtual sample;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Information Networking and Security (MINES), 2012 Fourth International Conference on
Conference_Location
Nanjing
Print_ISBN
978-1-4673-3093-0
Type
conf
DOI
10.1109/MINES.2012.13
Filename
6405624
Link To Document