Title :
Short Text Classification based on feature extension using The N-Gram model
Author :
Xinwei Zhang; Bin Wu
Author_Institution :
Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, China
Abstract :
With the rapid development of Web2.0, more and more people like to show their life or opinions on social media websites or forums, such as Weibo, Twitter and Tianya, which produce masses of short texts. In order to manage these short texts effectively, Short Text Classification becomes an important branch of Text Classification. However, because of the short text length, the lack of signals, and the sparseness of features, it is very difficult to achieve high quality classification by using conventional methods. This paper proposes a novelty feature extending method based on the N-Gram model to solve the problem of feature sparseness. From continuous word sequences in the train set, we extract n-grams as our feature extension mode library. Then using features showing in the short texts, we can compute the appearance probability of other words that do not exist in original texts. We use the data set collected from Sina Weibo to carry out our extension method. After extending features of the original short texts, we use the Naïve Bayes algorithm to train and evaluate a classifier. We use precision, recall and F1-Score to evaluate our work. The result shows that the extension method based on the N-Gram model can improve classification performance observably.
Keywords :
"Feature extraction","Libraries","Computational modeling","Semantics","Text categorization","Classification algorithms","Internet"
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
DOI :
10.1109/FSKD.2015.7382029