• DocumentCode
    3734212
  • Title

    Data collection and analysis of GitHub repositories and users

  • Author

    Fragkiskos Chatziasimidis;Ioannis Stamelos

  • Author_Institution
    Gnomon Informatics, SA 21 Anton is Tritsis Str. 57001 Thessaloniki, Greece
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we present the collection and mining of GitHub data, aiming to understand GitHub user behavior and project success factors. We collected information about approximately 100K projects and 10K GitHub users//owners of these projects, via GitHub API. Subsequently, we statistically analyzed such data, discretized values of features via k-means algorithm, and finally we applied apriori algorithm via weka in order to find out association rules. Having assumed that project success could be measured by the cardinality of downloads we kept only the rules which had as right par a download cardinality higher than a threshold of 1000 downloads. The results provide intersting insight in the GitHub ecosystem and seven success rules for GitHub projects.
  • Publisher
    ieee
  • Conference_Titel
    Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on
  • Type

    conf

  • DOI
    10.1109/IISA.2015.7388026
  • Filename
    7388026