• DocumentCode
    2824818
  • Title

    Investigating the use of lexical information for software system clustering

  • Author

    Corazza, Anna ; Di Martino, Sergio ; Maggio, V. ; Scanniello, Giuseppe

  • Author_Institution
    Dipt. di Sci. Fisiche Sezione Inf., Univ. of Naples Federico II, Naples, Italy
  • fYear
    2011
  • fDate
    1-4 March 2011
  • Firstpage
    35
  • Lastpage
    44
  • Abstract
    Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting lexical information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce lexical information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.
  • Keywords
    expectation-maximisation algorithm; object-oriented programming; pattern clustering; probability; attribute; class; comments; dictionaries; expectation-maximization algorithm; hierarchical clustering algorithm; lexical information; method; parameter names; probabilistic model; software system clustering; source code statements; Clustering algorithms; Java; Partitioning algorithms; Probabilistic logic; Software algorithms; Software systems; Clustering; Lexical Information; Probabilistic Model; Software Remodularization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on
  • Conference_Location
    Oldenburg
  • ISSN
    1534-5351
  • Print_ISBN
    978-1-61284-259-2
  • Type

    conf

  • DOI
    10.1109/CSMR.2011.8
  • Filename
    5741257