• DocumentCode
    1776442
  • Title

    Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization

  • Author

    Dalavi, Manesh ; Cheke, Shailesh

  • Author_Institution
    Dept. of Comput. Eng., RMD Sinhgad Sch. of Eng., Pune, India
  • fYear
    2014
  • fDate
    10-11 July 2014
  • Firstpage
    994
  • Lastpage
    999
  • Abstract
    Text Categorization is problem assigning text documents into fixed number of pre-defined categories. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus on two things first is to develop effective term weighting by proposing new term weighting scheme and second is to utilize the parallel and distributed processing capability of Hadoop MapReduce for training and testing of dataset. These two things leads to great performance improvement of text categorization by remarkable improvement in accuracy with a significant reduction of computational cost. Also because of the use of Hadoop MapReduce it reduces the training and testing time significantly.
  • Keywords
    computational complexity; feature selection; parallel processing; text analysis; Feature selection; Hadoop MapReduce implementation; computational cost reduction; distributed processing capability; parallel processing; term weighting; text categorization problem; text documents; Accuracy; Electronic mail; Instruments; Support vector machines; Testing; Text categorization; Training; Hadoop MapReduce; Support Vector Machine; Text Categorization; Weighting Scheme;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on
  • Conference_Location
    Kanyakumari
  • Print_ISBN
    978-1-4799-4191-9
  • Type

    conf

  • DOI
    10.1109/ICCICCT.2014.6993104
  • Filename
    6993104