• DocumentCode
    3739208
  • Title

    Citation Prediction Using Diverse Features

  • Author

    Harish S. Bhat;Li-Hsuan Huang;Sebastian Rodriguez;Rick Dale;Evan Heit

  • Author_Institution
    Univ. of California, Merced, Merced, CA, USA
  • fYear
    2015
  • Firstpage
    589
  • Lastpage
    596
  • Abstract
    Using a large database of nearly 8 million bibliographic entries spanning over 3 million unique authors, we build predictive models to classify a paper based on its citation count. Our approach involves considering a diverse array of features including the interdisciplinarity of authors, which we quantify using Shannon entropy and Jensen-Shannon divergence. Rather than rely on subject codes, we model the disciplinary preferences of each author by estimating the author´s journal distribution. We conduct an exploratory data analysis on the relationship between these interdisciplinarity variables and citation counts. In addition, we model the effects of (1) each author´s influence in coauthorship graphs, and (2) words in the title of the paper. We then build classifiers for two-and three-class classification problems that correspond to predicting the interval in which a paper´s citation count will lie. We use cross-validation and a true test set to tune model parameters and assess model performance. The best model we build, a classification tree, yields test set accuracies of 0.87 and 0.66, respectively. Using this model, we also provide rankings of attribute importance, for the three-class problem, these rankings indicate the importance of our interdisciplinarity metrics in predicting citation counts.
  • Keywords
    "Predictive models","Entropy","Databases","Feature extraction","Training","Data mining","Measurement"
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
  • Electronic_ISBN
    2375-9259
  • Type

    conf

  • DOI
    10.1109/ICDMW.2015.131
  • Filename
    7395721