• DocumentCode
    168347
  • Title

    Towards a stratified learning approach to predict future citation counts

  • Author

    Chakraborty, Tamal ; Kumar, Sudhakar ; Goyal, Puneet ; Ganguly, Niloy ; Mukherjee, Arjun

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, Kharagpur, India
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    351
  • Lastpage
    360
  • Abstract
    In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a two-stage prediction model - in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the well-known baseline system.
  • Keywords
    citation analysis; publishing; query processing; regression analysis; baseline system; citation patterns; computer science domain; future citation count prediction; huge dataset; prediction task; publication; query paper; regression module; scientific article; scientific papers; stratified learning approach; two-stage prediction model; Abstracts; Accuracy; Computer science; Predictive models; Productivity; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/JCDL.2014.6970190
  • Filename
    6970190