REPENT: Analyzing the Nature of Identifier Renamings

Author

Arnaoudova, V. ; Eshkevari, Laleh M. ; Di Penta, Massimiliano ; Oliveto, Rocco ; Antoniol, Giuliano ; Gueheneuc, Yann-Gael

Author_Institution

Polytech. Montreal, Montreal, QC, Canada

Volume

40

Issue

5

fYear

2014

fDate

May-14

Firstpage

502

Lastpage

532

Abstract

Source code lexicon plays a paramount role in software quality: poor lexicon can lead to poor comprehensibility and even increase software fault-proneness. For this reason, renaming a program entity, i.e., altering the entity identifier, is an important activity during software evolution. Developers rename when they feel that the name of an entity is not (anymore) consistent with its functionality, or when such a name may be misleading. A survey that we performed with 71 developers suggests that 39 percent perform renaming from a few times per week to almost every day and that 92 percent of the participants consider that renaming is not straightforward. However, despite the cost that is associated with renaming, renamings are seldom if ever documented-for example, less than 1 percent of the renamings in the five programs that we studied. This explains why participants largely agree on the usefulness of automatically documenting renamings. In this paper we propose REanaming Program ENTities (REPENT), an approach to automatically document-detect and classify-identifier renamings in source code. REPENT detects renamings based on a combination of source code differencing and data flow analyses. Using a set of natural language tools, REPENT classifies renamings into the different dimensions of a taxonomy that we defined. Using the documented renamings, developers will be able to, for example, look up methods that are part of the public API (as they impact client applications), or look for inconsistencies between the name and the implementation of an entity that underwent a high risk renaming (e.g., towards the opposite meaning). We evaluate the accuracy and completeness of REPENT on the evolution history of five open-source Java programs. The study indicates a precision of 88 percent and a recall of 92 percent. In addition, we report an exploratory study investigating and discussing how identifiers are renamed in the five programs, according to our taxonomy.

Keywords

data flow analysis; pattern classification; software fault tolerance; software quality; source code (software); REPENT; data flow analysis; entity identifier; identifier renaming analysis; natural language tools; open-source Java programs; program entity renaming; public API; reanaming program entities; software evolution; software fault-proneness; software quality; source code lexicon; taxonomy dimensions; Documentation; Grammar; History; Java; Semantics; Software; Taxonomy; Identifier renaming; empirical study; mining software repositories; program comprehension; refactoring;

fLanguage

English

Journal_Title

Software Engineering, IEEE Transactions on

Publisher

ieee

ISSN

0098-5589

Type

jour

DOI

10.1109/TSE.2014.2312942

Filename

6776542