DocumentCode
732086
Title
Unsupervised Software Categorization Using Bytecode
Author
Escobar-Avila, Javier ; Linares-Vasquez, Mario ; Haiduc, Sonia
fYear
2015
fDate
18-19 May 2015
Firstpage
229
Lastpage
239
Abstract
Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant software can be easily retrieved by developers from large repositories. State of the art approaches either rely on the availability of the source code, or use supervised machine learning approaches, which require a set of already labeled software as training data. These restrictions make current approaches fail when such information is not available. We propose a novel approach, which overcomes these limitations by using semantic information recovered from byte code and an unsupervised algorithm to assign categories to software systems. We evaluated our approach in a study on the Apache Foundation Repository of Java libraries and the results indicate that our approach is able to correctly identify a correct category for 86% of the libraries.
Keywords
Accuracy; Clustering algorithms; Data mining; Java; Software; Software libraries; bytecode; clustering; dirichlet process; software categorization; software profiles;
fLanguage
English
Publisher
ieee
Conference_Titel
Program Comprehension (ICPC), 2015 IEEE 23rd International Conference on
Conference_Location
Florence, Italy
Type
conf
DOI
10.1109/ICPC.2015.33
Filename
7181451
Link To Document