DocumentCode :
3359122
Title :
Automatic Text Classification of sports blog data
Author :
Dalal, Mita K. ; Zaveri, Mukesh A.
Author_Institution :
Inf. Technol. Dept., Sarvajanik Coll. of Eng. & Technol., Surat, India
fYear :
2012
fDate :
11-13 Jan. 2012
Firstpage :
219
Lastpage :
222
Abstract :
Automatic Text Classification is a semi-supervised machine learning task that automatically assigns a given text document to a set of pre-defined categories based on the features extracted from its textual content. This paper attempts to automatically classify the textual entries made by bloggers on various sports blogs, to the appropriate category of sport by following steps like pre-processing, feature extraction and naïve Bayesian classification. Empirical evaluation of this technique has resulted in a classification accuracy of approximately 87% over the test set. In addition to classifying the textual entries of sports blogs, it is proposed that the extracted features themselves be further classified under more meaningful heads which results in generation of a semantic resource that lends greater understanding to the classification task. This semantic resource can be used for data mining requirements that arise in the future.
Keywords :
Bayes methods; Web sites; data mining; feature extraction; learning (artificial intelligence); pattern classification; semantic Web; sport; text analysis; automatic text classification; data mining requirements; feature extraction; naïve Bayesian classification; semantic resource; semi-supervised machine learning task; sports blog data; text document; Accuracy; Bayesian methods; Blogs; Feature extraction; Semantics; Text categorization; Training; automatic text classification; feature extraction; heuristics; intelligent data mining; machine learning; naïve Bayes classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing, Communications and Applications Conference (ComComAp), 2012
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4577-1717-8
Type :
conf
DOI :
10.1109/ComComAp.2012.6154802
Filename :
6154802
Link To Document :
بازگشت