DocumentCode
1983961
Title
Exploring Java software vocabulary: A search and mining perspective
Author
Linstead, Erik ; Hughes, Lindsey ; Lopes, Cristina ; Baldi, Pierre
Author_Institution
Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA
fYear
2009
fDate
16-16 May 2009
Firstpage
29
Lastpage
32
Abstract
We conduct a large-scale analysis of Java source code vocabulary for 12,151 open source projects from Source-Forge and Apache, a corpus substantially larger than considered previously. Simple statistical analysis demonstrates robust power-law behavior for word count distributions across multiple program entities. We then identify salient vocabulary trends for classes, interfaces, methods, and fields. Our results provide low-level insight into the vocabulary space governing Java software development, with direct application to program comprehension and software search. Supplementary material may be found at: http://sourcerer.ics.uci.edu/suite2009/suite.html.
Keywords
Java; data mining; statistical analysis; Apache; Java software development; Java software vocabulary; Java source code vocabulary; Source-Forge; large-scale analysis; mining perspective; multiple program entities; program comprehension; robust power-law behavior; search perspective; software search; statistical analysis; word count distribution; Application software; Computer languages; Information retrieval; Internet; Java; Large-scale systems; Natural languages; Software tools; Statistical analysis; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Search-Driven Development-Users, Infrastructure, Tools and Evaluation, 2009. SUITE '09. ICSE Workshop on
Conference_Location
Vancouver, BC
Print_ISBN
978-1-4244-3740-5
Type
conf
DOI
10.1109/SUITE.2009.5070017
Filename
5070017
Link To Document