DocumentCode :
735352
Title :
Big data analytics on large-scale socio-technical software engineering archives
Author :
Bayati, Shahabedin ; Parsons, David ; Susnjak, Teo ; Heidary, Marzieh
Author_Institution :
Sch. of Eng. & Adv. Technol., Massey Univ., Auckland, New Zealand
fYear :
2015
fDate :
27-29 May 2015
Firstpage :
65
Lastpage :
69
Abstract :
Given the fast growing nature of software engineering data in online software repositories and open source communities, it would be helpful to analyse these assets to discover valuable information about the software engineering development process and other related data. Big Data Analytics (BDA) techniques and frameworks can be applied on these data resources to achieve a high-performance and relevant data collection and analysis. Software engineering is a socio-technical process which needs development team collaboration and technical knowledge to develop a high-quality application. GitHub, as an online social coding foundation, contains valuable information about the software engineers´ communications and project life cycles. In this paper, unsupervised data mining techniques are applied on the data collected by general Big Data approaches to analyse GitHub projects, source codes and interactions. Source codes and projects are clustered using features and metrics derived from historical data in repositories, object oriented programming metrics and the influences of developers on source codes.
Keywords :
Big Data; data analysis; data mining; object-oriented programming; public domain software; software engineering; software metrics; source code (software); unsupervised learning; BDA techniques; GitHub projects; big data analytics techniques; big data approach; high-quality application; large-scale socio-technical software engineering archives; object oriented programming metrics; online social coding foundation; online software repositories; open source communities; project life cycles; socio-technical process; software engineer communications; software engineering data; software engineering development process; source codes; team collaboration; technical knowledge; unsupervised data mining techniques; Big data; Data mining; Encoding; Feature extraction; Measurement; Software; Software engineering; Big Data; Clustering; Empirical Software Engineering; GitHub Mining; Mining Software Repositories (MSR);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technology (ICoICT ), 2015 3rd International Conference on
Conference_Location :
Nusa Dua
Type :
conf
DOI :
10.1109/ICoICT.2015.7231398
Filename :
7231398
Link To Document :
بازگشت