Title :
Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization
Author :
Dalavi, Manesh ; Cheke, Shailesh
Author_Institution :
Dept. of Comput. Eng., RMD Sinhgad Sch. of Eng., Pune, India
Abstract :
Text Categorization is problem assigning text documents into fixed number of pre-defined categories. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus on two things first is to develop effective term weighting by proposing new term weighting scheme and second is to utilize the parallel and distributed processing capability of Hadoop MapReduce for training and testing of dataset. These two things leads to great performance improvement of text categorization by remarkable improvement in accuracy with a significant reduction of computational cost. Also because of the use of Hadoop MapReduce it reduces the training and testing time significantly.
Keywords :
computational complexity; feature selection; parallel processing; text analysis; Feature selection; Hadoop MapReduce implementation; computational cost reduction; distributed processing capability; parallel processing; term weighting; text categorization problem; text documents; Accuracy; Electronic mail; Instruments; Support vector machines; Testing; Text categorization; Training; Hadoop MapReduce; Support Vector Machine; Text Categorization; Weighting Scheme;
Conference_Titel :
Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on
Conference_Location :
Kanyakumari
Print_ISBN :
978-1-4799-4191-9
DOI :
10.1109/ICCICCT.2014.6993104