• DocumentCode
    3659482
  • Title

    MR-VSM: Map Reduce based vector Space Model for user profiling-an empirical study on News data

  • Author

    Anjali Gautam;Punam Bedi

  • Author_Institution
    Department of Computer Science, University of Delhi, India
  • fYear
    2015
  • Firstpage
    355
  • Lastpage
    360
  • Abstract
    Velocity of data generation has increased over a period of decade which is expected to further increase exponentially with the passage of time. To mine the useful nuggets of information, satisfying a large community of users it is preferred to capture the interest of the user, i.e., to create a user profile, and then filter the content according to his taste. A user may traverse through a large number of documents, requiring a user profiling technique to support the scalability of growing number of documents. This paper proposes a novel technique of user profiling - Map Reduce based Vector Space Model (MR-VSM). MR-VSM is a technique for user profiling where the user interacts with data rich in text and volume. MR-VSM implements traditional VSM to use Map Reduce, a parallel programming paradigm to increase the computational efficiency and support scalability of documents. It works by parallelizing the task of creating a term-document class of VSM by using TF-IDF to create term vector. For experimental study this paper makes use of the News dataset which is rich in text and volume and is collected from the web using RSS feeds. The proposed system creates user profile by taking into consideration the News item read by the user and creating a term vector for each news item read. Resulting user profile is set of Top-n terms. To test the computational efficiency and scalability of MR-VSM for growing number of news items read by user, MR-VSM is made to run on a cluster of Hadoop for 12,000, 24,000 and 48000 news items. VSM is also run for 1,500 news items to show the computational efficiency of the proposed approach. It is observed that for MR-VSM computational time for user profiling and scalability of news item read by the user are improved with the increase in the number of nodes in a Hadoop cluster.
  • Keywords
    "Scalability","Feeds","Computational efficiency","Computational modeling","Filtering","Informatics","Databases"
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on
  • Print_ISBN
    978-1-4799-8790-0
  • Type

    conf

  • DOI
    10.1109/ICACCI.2015.7275635
  • Filename
    7275635