• DocumentCode
    1983961
  • Title

    Exploring Java software vocabulary: A search and mining perspective

  • Author

    Linstead, Erik ; Hughes, Lindsey ; Lopes, Cristina ; Baldi, Pierre

  • Author_Institution
    Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA
  • fYear
    2009
  • fDate
    16-16 May 2009
  • Firstpage
    29
  • Lastpage
    32
  • Abstract
    We conduct a large-scale analysis of Java source code vocabulary for 12,151 open source projects from Source-Forge and Apache, a corpus substantially larger than considered previously. Simple statistical analysis demonstrates robust power-law behavior for word count distributions across multiple program entities. We then identify salient vocabulary trends for classes, interfaces, methods, and fields. Our results provide low-level insight into the vocabulary space governing Java software development, with direct application to program comprehension and software search. Supplementary material may be found at: http://sourcerer.ics.uci.edu/suite2009/suite.html.
  • Keywords
    Java; data mining; statistical analysis; Apache; Java software development; Java software vocabulary; Java source code vocabulary; Source-Forge; large-scale analysis; mining perspective; multiple program entities; program comprehension; robust power-law behavior; search perspective; software search; statistical analysis; word count distribution; Application software; Computer languages; Information retrieval; Internet; Java; Large-scale systems; Natural languages; Software tools; Statistical analysis; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Search-Driven Development-Users, Infrastructure, Tools and Evaluation, 2009. SUITE '09. ICSE Workshop on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4244-3740-5
  • Type

    conf

  • DOI
    10.1109/SUITE.2009.5070017
  • Filename
    5070017