• DocumentCode
    3694242
  • Title

    Query by example in large-scale code repositories

  • Author

    Vipin Balachandran

  • Author_Institution
    VMware, Bangalore, India
  • fYear
    2015
  • Firstpage
    467
  • Lastpage
    476
  • Abstract
    Searching code samples in a code repository is an important part of program comprehension. Most of the existing tools for code search support syntactic element search and regular expression pattern search. However, they are text-based and hence cannot handle queries which are syntactic patterns. The proposed solutions for querying syntactic patterns using specialized query languages present a steep learning curve for users. The querying would be more user-friendly if the syntactic pattern can be formulated in the underlying programming language (as a sample code snippet) instead of a specialized query language. In this paper, we propose a solution for the query by example problem using Abstract Syntax Tree (AST) structural similarity match. The query snippet is converted to an AST, then its subtrees are compared against AST subtrees of source files in the repository and the similarity values of matching subtrees are aggregated to arrive at a relevance score for each of the source files. To scale this approach to large code repositories, we use locality-sensitive hash functions and numerical vector approximation of trees. Our experimental evaluation involves running control queries against a real project. The results show that our algorithm can achieve high precision (0.73) and recall (0.81) and scale to large code repositories without compromising quality.
  • Keywords
    "Vegetation","Syntactics","Euclidean distance","Approximation methods","Search engines","Java","Approximation algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/ICSM.2015.7332498
  • Filename
    7332498