DocumentCode :
2008754
Title :
An Application of Latent Dirichlet Allocation to Analyzing Software Evolution
Author :
Linstead, Erik ; Lopes, Cristina ; Baldi, Pierre
Author_Institution :
Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA, USA
fYear :
2008
fDate :
11-13 Dec. 2008
Firstpage :
813
Lastpage :
818
Abstract :
We develop and apply unsupervised statistical topic models, in particular latent Dirichlet allocation, to identify functional components of source code and study their evolution over multiple project versions. We present results for two large, open source Java projects, Eclipse and Argo UML, which are well-known and well-studied within the software mining community. Our results demonstrate the effectiveness of probabilistic topic models in automatically summarizing the temporal dynamics of software concerns, with direct application to project management and program understanding. In addition to detecting the emergence of topics on the release timeline which represent integration points for key source code functionality, our techniques can also be used to pinpoint refactoring events in the underlying software design, as well as to identify general programming concepts whose prevalence is dependent only of the size of the code base to be analyzed. Complete results are available from our supplementary materials website at http://sourcerer.ics.uci.edu/icmla2008/software_evolution.html.
Keywords :
public domain software; software engineering; statistical analysis; Argo UML; Eclipse; latent Dirichlet allocation; open source Java projects; software design; software evolution analysis; software mining community; unsupervised statistical topic models; Application software; Computer bugs; History; Information analysis; Java; Linear discriminant analysis; Machine learning; Open source software; Project management; Software engineering; latent dirichlet allocation; software evolution; software mining; topic models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-0-7695-3495-4
Type :
conf
DOI :
10.1109/ICMLA.2008.47
Filename :
4725072
Link To Document :
بازگشت