DocumentCode :
1776442
Title :
Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization
Author :
Dalavi, Manesh ; Cheke, Shailesh
Author_Institution :
Dept. of Comput. Eng., RMD Sinhgad Sch. of Eng., Pune, India
fYear :
2014
fDate :
10-11 July 2014
Firstpage :
994
Lastpage :
999
Abstract :
Text Categorization is problem assigning text documents into fixed number of pre-defined categories. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus on two things first is to develop effective term weighting by proposing new term weighting scheme and second is to utilize the parallel and distributed processing capability of Hadoop MapReduce for training and testing of dataset. These two things leads to great performance improvement of text categorization by remarkable improvement in accuracy with a significant reduction of computational cost. Also because of the use of Hadoop MapReduce it reduces the training and testing time significantly.
Keywords :
computational complexity; feature selection; parallel processing; text analysis; Feature selection; Hadoop MapReduce implementation; computational cost reduction; distributed processing capability; parallel processing; term weighting; text categorization problem; text documents; Accuracy; Electronic mail; Instruments; Support vector machines; Testing; Text categorization; Training; Hadoop MapReduce; Support Vector Machine; Text Categorization; Weighting Scheme;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on
Conference_Location :
Kanyakumari
Print_ISBN :
978-1-4799-4191-9
Type :
conf
DOI :
10.1109/ICCICCT.2014.6993104
Filename :
6993104
Link To Document :
بازگشت