Title :
An Improved Text Categorization Algorithm Based on VSM
Author :
Ji Geng ; Yunling Lu ; Wei Chen ; Zhiguang Qin
Author_Institution :
Sch. of Comput. Sci. & Eng., UESTC, Chengdu, China
Abstract :
With the advent of the information age, various kinds of information have been spread on the Internet. The amount of junk information affects people´s lives seriously. In order to filter the harmful Web pages efficiently and effectively, we have suggested a novel text classification algorithm based on Vector Space Model in this paper. This algorithm has adopted the modularized processing mode to deal with Web pages. In addition, it has introduced the proportion of feature selection and improved the traditional Term Frequency-Inverse Document Frequency weighting method. Furthermore, the simulation of our algorithm and other existing work has been given. The comparison shows that our algorithm enjoying higher accuracy and classification precision, which achieves a better system performance and a better classifying effect.
Keywords :
Internet; classification; feature selection; information filtering; text analysis; Internet; VSM; Web page filtering; feature selection; term frequency-inverse document frequency weighting method; text categorization algorithm; text classification algorithm; vector space model; Classification algorithms; Filtering algorithms; Information filters; Support vector machine classification; Text categorization; Web pages; Vector Space Model; classification; modularized; proportion;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.313