مرکز منطقه ای اطلاع رساني علوم و فناوري - Short Text Classification based on feature extension using The N-Gram model

DocumentCode :

3730437

Title :

Short Text Classification based on feature extension using The N-Gram model

Author :

Xinwei Zhang; Bin Wu

Author_Institution :

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, China

fYear :

2015

Firstpage :

710

Lastpage :

716

Abstract :

With the rapid development of Web2.0, more and more people like to show their life or opinions on social media websites or forums, such as Weibo, Twitter and Tianya, which produce masses of short texts. In order to manage these short texts effectively, Short Text Classification becomes an important branch of Text Classification. However, because of the short text length, the lack of signals, and the sparseness of features, it is very difficult to achieve high quality classification by using conventional methods. This paper proposes a novelty feature extending method based on the N-Gram model to solve the problem of feature sparseness. From continuous word sequences in the train set, we extract n-grams as our feature extension mode library. Then using features showing in the short texts, we can compute the appearance probability of other words that do not exist in original texts. We use the data set collected from Sina Weibo to carry out our extension method. After extending features of the original short texts, we use the Naïve Bayes algorithm to train and evaluate a classifier. We use precision, recall and F1-Score to evaluate our work. The result shows that the extension method based on the N-Gram model can improve classification performance observably.

Keywords :

"Feature extraction","Libraries","Computational modeling","Semantics","Text categorization","Classification algorithms","Internet"

Publisher :

ieee

Conference_Titel :

Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on

Type :

conf

DOI :

10.1109/FSKD.2015.7382029

Filename :

7382029

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3730437