• DocumentCode
    732086
  • Title

    Unsupervised Software Categorization Using Bytecode

  • Author

    Escobar-Avila, Javier ; Linares-Vasquez, Mario ; Haiduc, Sonia

  • fYear
    2015
  • fDate
    18-19 May 2015
  • Firstpage
    229
  • Lastpage
    239
  • Abstract
    Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant software can be easily retrieved by developers from large repositories. State of the art approaches either rely on the availability of the source code, or use supervised machine learning approaches, which require a set of already labeled software as training data. These restrictions make current approaches fail when such information is not available. We propose a novel approach, which overcomes these limitations by using semantic information recovered from byte code and an unsupervised algorithm to assign categories to software systems. We evaluated our approach in a study on the Apache Foundation Repository of Java libraries and the results indicate that our approach is able to correctly identify a correct category for 86% of the libraries.
  • Keywords
    Accuracy; Clustering algorithms; Data mining; Java; Software; Software libraries; bytecode; clustering; dirichlet process; software categorization; software profiles;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Program Comprehension (ICPC), 2015 IEEE 23rd International Conference on
  • Conference_Location
    Florence, Italy
  • Type

    conf

  • DOI
    10.1109/ICPC.2015.33
  • Filename
    7181451