DocumentCode
1776442
Title
Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization
Author
Dalavi, Manesh ; Cheke, Shailesh
Author_Institution
Dept. of Comput. Eng., RMD Sinhgad Sch. of Eng., Pune, India
fYear
2014
fDate
10-11 July 2014
Firstpage
994
Lastpage
999
Abstract
Text Categorization is problem assigning text documents into fixed number of pre-defined categories. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus on two things first is to develop effective term weighting by proposing new term weighting scheme and second is to utilize the parallel and distributed processing capability of Hadoop MapReduce for training and testing of dataset. These two things leads to great performance improvement of text categorization by remarkable improvement in accuracy with a significant reduction of computational cost. Also because of the use of Hadoop MapReduce it reduces the training and testing time significantly.
Keywords
computational complexity; feature selection; parallel processing; text analysis; Feature selection; Hadoop MapReduce implementation; computational cost reduction; distributed processing capability; parallel processing; term weighting; text categorization problem; text documents; Accuracy; Electronic mail; Instruments; Support vector machines; Testing; Text categorization; Training; Hadoop MapReduce; Support Vector Machine; Text Categorization; Weighting Scheme;
fLanguage
English
Publisher
ieee
Conference_Titel
Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on
Conference_Location
Kanyakumari
Print_ISBN
978-1-4799-4191-9
Type
conf
DOI
10.1109/ICCICCT.2014.6993104
Filename
6993104
Link To Document