DocumentCode :
735379
Title :
Source code retrieval on StackOverflow using LDA
Author :
Arwan, Achmad ; Rochimah, Siti ; Akbar, Rizky Januar
Author_Institution :
Dept. of Inf., Univ. of Brawijaya, Malang, Indonesia
fYear :
2015
fDate :
27-29 May 2015
Firstpage :
295
Lastpage :
299
Abstract :
Internet code search is quite popular research area. StackOverflow allows developers to ask and answer questions about code. Previous approach to search code on StackOverflow uses tf-idf method that based on number of occurrences of words to recommend source code. This method has the disadvantage that variable or method identifiers are considered as normal words, even though identifiers are often a combination of two or more words. For example, there is an identifier named “randomString”. In that case, if we search using a keyword “random” the system probably will not recommend “randomString” because both words are different. Concept location can tackle this problem. Concept location has been used widely to obtain the correlation between code with a specific concepts or features. Previous research of concept location only focused on source code´s comments, and relation among the objects within the source code. This research proposes a mechanism for finding code on StackOverflow uses Latent Dirichlet Allocation (LDA) using concept location in the preprocessing stage. Questions, answers, and code snippets about Java programming are downloaded from StackOverflow to a local repository. Corpuses are generated by extracting questions, answers and code snippets. Inferencing concept location from source code is created using LDA algorithm. Developers query concepts and then system will recommend source code based on the relevant concepts. The result of the experiment shows that the system is able to recommend source code with 48% average of precision and 58% average of recall.
Keywords :
Java; query processing; question answering (information retrieval); recommender systems; source code (software); Internet code search; Java programming; LDA algorithm; StackOverflow; code snippets; concept location; corpus generation; latent Dirichlet allocation; local repository; method identifiers; normal words; precision value; query concepts; question answering; question asking; randomString identifier; recall value; source code recommendation; source code retrieval; tf-idf method; variable identifiers; word occurrences; Conferences; Data preprocessing; Java; Programming; Resource management; Software; Concept Location; Latent Dirichlet Allocation; Source Code Searching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technology (ICoICT ), 2015 3rd International Conference on
Conference_Location :
Nusa Dua
Type :
conf
DOI :
10.1109/ICoICT.2015.7231439
Filename :
7231439
Link To Document :
بازگشت