DocumentCode :
2824818
Title :
Investigating the use of lexical information for software system clustering
Author :
Corazza, Anna ; Di Martino, Sergio ; Maggio, V. ; Scanniello, Giuseppe
Author_Institution :
Dipt. di Sci. Fisiche Sezione Inf., Univ. of Naples Federico II, Naples, Italy
fYear :
2011
fDate :
1-4 March 2011
Firstpage :
35
Lastpage :
44
Abstract :
Developers have a lot of freedom in writing comments as well as in choosing identifiers and method names. These are intentional in nature and provide a different relevance of information to understand what a software system implements, and in particular the role of each source file. In this paper we investigate the effectiveness of exploiting lexical information for software system clustering. In particular we explore the contribution of the combined use of six different dictionaries, corresponding to the six parts of the source code where programmers introduce lexical information, namely: class, attribute, method and parameter names, comments, and source code statements. Their relevance has been weighted by means of a probabilistic model, whose parameters have been estimated by the Expectation-Maximization algorithm. To group source files accordingly we used a hierarchical clustering algorithm. The investigation has been conducted on a dataset of 13 open source Java software systems.
Keywords :
expectation-maximisation algorithm; object-oriented programming; pattern clustering; probability; attribute; class; comments; dictionaries; expectation-maximization algorithm; hierarchical clustering algorithm; lexical information; method; parameter names; probabilistic model; software system clustering; source code statements; Clustering algorithms; Java; Partitioning algorithms; Probabilistic logic; Software algorithms; Software systems; Clustering; Lexical Information; Probabilistic Model; Software Remodularization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on
Conference_Location :
Oldenburg
ISSN :
1534-5351
Print_ISBN :
978-1-61284-259-2
Type :
conf
DOI :
10.1109/CSMR.2011.8
Filename :
5741257
Link To Document :
بازگشت