Title :
Automatically mining software-based, semantically-similar words from comment-code mappings
Author :
Howard, Matthew J. ; Gupta, Swastik ; Pollock, Lori ; Vijay-Shanker, K.
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
Abstract :
Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.
Keywords :
data mining; natural language processing; software maintenance; software tools; text analysis; WordNet; automatic mining software-based semantically-similar word mining; comment-code word mapping; computer science; natural language documents; natural language words; software artifacts; software development tools; software maintenance tools; software traceability; synonyms; textual artifact representation; vocabulary mismatch; Computer science; Context; Data mining; Maintenance engineering; Semantics; Software; Tagging;
Conference_Titel :
Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-0345-0
DOI :
10.1109/MSR.2013.6624052