Title :
Automatic Categorization of Software Libraries Using Bytecode
Author :
Escobar-Avila, Javier
Author_Institution :
Dept. of Comput. Sci., Florida State Univ., Tallahassee, FL, USA
Abstract :
Automatic software categorization is the task of assigning categories or tags to software libraries in order to summarize their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. Current categorization approaches rely on the semantics reflected in the source code, or use supervised machine learning techniques, which require a set of labeled software as a training data. These approaches fail when such information is not available. We propose a novel unsupervised approach for the automatic categorization of Java libraries, which uses the bytecode of a library in order to determine its category. We show that the approach is able to successfully categorize libraries from the Apache Foundation Repository.
Keywords :
Java; software libraries; source code (software); unsupervised learning; Apache Foundation Repository; Java libraries; automatic software library categorization; bytecode; source code; unsupervised approach; Accuracy; Conferences; Data mining; Semantics; Software; Software libraries; automatic labeling; bytecode; clustering; dirichlet process; software categorization;
Conference_Titel :
Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICSE.2015.249