Title :
A Linked Data platform for mining software repositories
Author :
Keivanloo, Iman ; Forbes, Christopher ; Hmood, Aseel ; Erfani, Mostafa ; Neal, Christopher ; Peristerakis, George ; Rilling, Juergen
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada
Abstract :
The mining of software repositories involves the extraction of both basic and value-added information from existing software repositories. The repositories will be mined to extract facts by different stakeholders (e.g. researchers, managers) and for various purposes. To avoid unnecessary pre-processing and analysis steps, sharing and integration of both basic and value-added facts are needed. In this research, we introduce SeCold, an open and collaborative platform for sharing software datasets. SeCold provides the first online software ecosystem Linked Data platform that supports data extraction and on-the-fly inter-dataset integration from major version control, issue tracking, and quality evaluation systems. In its first release, the dataset contains about two billion facts, such as source code statements, software licenses, and code clones from 18 000 software projects. In its second release the SeCold project will contain additional facts mined from issue trackers and versioning systems. Our approach is based on the same fundamental principle as Wikipedia: researchers and tool developers share analysis results obtained from their tools by publishing them as part of the SeCold portal and therefore make them an integrated part of the global knowledge domain. The SeCold project is an official member of the Linked Data dataset cloud and is currently the eighth largest online dataset available on the Web.
Keywords :
data mining; software packages; code clones; collaborative platform; linked data platform; mining software repositories; on-the-fly inter-dataset integration; online software ecosystem linked data platform; software datasets; software licenses; software packages; software repositories; source code statements; value added information; Cloning; Communities; Data mining; Encyclopedias; Licenses; Ontologies; Software; Linked Data; fact sharing; software mining;
Conference_Titel :
Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1760-3
DOI :
10.1109/MSR.2012.6224296