DocumentCode :
1974858
Title :
Using Latent Dirichlet Allocation for automatic categorization of software
Author :
Tian, Kai ; Revelle, Meghan ; Poshyvanyk, Denys
Author_Institution :
Comput. Sci. Dept., Coll. of William & Mary, Williamsburg, VA
fYear :
2009
fDate :
16-17 May 2009
Firstpage :
163
Lastpage :
166
Abstract :
In this paper, we propose a technique called LACT for automatically categorizing software systems in open-source repositories. LACT is based on latent Dirichlet Allocation, an information retrieval method which is used to index and analyze source code documents as mixtures of probabilistic topics. For an initial evaluation, we performed two studies. In the first study, LACT was compared against an existing tool, MUDABlue, for classifying 41 software systems written in C into problem domain categories. The results indicate that LACT can automatically produce meaningful category names and yield classification results comparable to MUDABlue. In the second study, we applied LACT to 43 software systems written in different programming languages such as C/C++, Java, C#, PHP, and Perl. The results indicate that LACT can be used effectively for the automatic categorization of software systems regardless of the underlying programming language or paradigm. Moreover, both studies indicate that LACT can identify several new categories that are based on libraries, architectures, or programming languages, which is a promising improvement as compared to manual categorization and existing techniques.
Keywords :
C language; information retrieval; public domain software; software architecture; software libraries; C language; LACT technique; MUDABlue; automatic categorization software system; information retrieval method; latent Dirichlet allocation; open-source repository; probabilistic topic model; programming language; software architecture; software library; source code document; Computer architecture; Computer languages; Information analysis; Information retrieval; Java; Manuals; Open source software; Performance evaluation; Software libraries; Software systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mining Software Repositories, 2009. MSR '09. 6th IEEE International Working Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4244-3493-0
Type :
conf
DOI :
10.1109/MSR.2009.5069496
Filename :
5069496
Link To Document :
بازگشت