• DocumentCode
    2704673
  • Title

    Using latent semantic analysis to identify similarities in source code to support program understanding

  • Author

    Maletic, Jonathan I. ; Marcus, Andrian

  • Author_Institution
    Div. of Comput. Sci., Memphis Univ., Memphis, TN, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    46
  • Lastpage
    53
  • Abstract
    The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying latent semantic analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language
  • Keywords
    computational linguistics; information retrieval; natural languages; reverse engineering; statistical analysis; system documentation; LSA; Mosaic; corpus based statistical method; information retrieval method; internal documentation; latent semantic analysis; natural language; nontrivial software system; program understanding; software component clustering; software components; source code; source code similarities; Application software; Computer architecture; Computer science; Documentation; Information analysis; Information retrieval; Natural languages; Software maintenance; Software systems; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2000. ICTAI 2000. Proceedings. 12th IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-0909-6
  • Type

    conf

  • DOI
    10.1109/TAI.2000.889845
  • Filename
    889845