Title :
My Repository Runneth Over: An Empirical Study on Diversifying Data Sources to Improve Feature Search
Author :
Ratanotayanon, Sukanya ; Choi, Hye Jung ; Sim, Susan Elliott
Author_Institution :
Dept. of Inf., Univ. of California, Irvine, CA, USA
fDate :
June 30 2010-July 2 2010
Abstract :
Research on feature location that applies information retrieval techniques have experimented the kinds of inputs to the corpus and the algorithms that could be used. At first, only source code was used. Later extraction techniques were improved, and data from other software tools and analyses were used to expand or augment the repository. But, does having more diverse data in the repository always produce better results? In this paper, we report on an empirical study to examine the effect of increasing data diversity to improve feature location through search. In particular, we looked at the effect of including: i) change sets from revision control system, ii) tickets from issue trackers, and iii) elements from a Static Dependency Graph (SDG). We searched for three features of Jajuk, an open source Java jukebox, and two features of jEdit, an open source Java text editor. We used four different corpuses built with a combination of the above data. We used Eclipse´s code search and an index built with source code as baseline conditions. We found that it is not always better to have more diverse data. Adding SDG data to change sets increased recall, but drove down precision. Adding data from issue trackers had little effect and in one case lowered recall. We also found that large-scale refactoring of the code decreases the effectiveness using change sets for feature location.
Keywords :
Java; data analysis; information retrieval; public domain software; software maintenance; Eclipse code search; Jajuk; data diversity; data source analysis; extraction techniques; feature location set research; information retrieval techniques; jEdit; large-scale refactoring; open source Java jukebox; open source Java text editor; revision control system; software tools; source code; static dependency graph; Control systems; Data mining; Informatics; Information retrieval; Java; Large-scale systems; Software performance; Software tools; USA Councils; Vocabulary; change sets; code search; component; feature location; program comprehension;
Conference_Titel :
Program Comprehension (ICPC), 2010 IEEE 18th International Conference on
Conference_Location :
Braga, Minho
Print_ISBN :
978-1-4244-7604-6
Electronic_ISBN :
1092-8138
DOI :
10.1109/ICPC.2010.33