Title :
Data collection and analysis of GitHub repositories and users
Author :
Fragkiskos Chatziasimidis;Ioannis Stamelos
Author_Institution :
Gnomon Informatics, SA 21 Anton is Tritsis Str. 57001 Thessaloniki, Greece
fDate :
7/1/2015 12:00:00 AM
Abstract :
In this paper, we present the collection and mining of GitHub data, aiming to understand GitHub user behavior and project success factors. We collected information about approximately 100K projects and 10K GitHub users//owners of these projects, via GitHub API. Subsequently, we statistically analyzed such data, discretized values of features via k-means algorithm, and finally we applied apriori algorithm via weka in order to find out association rules. Having assumed that project success could be measured by the cardinality of downloads we kept only the rules which had as right par a download cardinality higher than a threshold of 1000 downloads. The results provide intersting insight in the GitHub ecosystem and seven success rules for GitHub projects.
Conference_Titel :
Information, Intelligence, Systems and Applications (IISA), 2015 6th International Conference on
DOI :
10.1109/IISA.2015.7388026