• DocumentCode
    3179183
  • Title

    Extracting Source Code from E-Mails

  • Author

    Bacchelli, Alberto ; D´Ambros, Marco ; Lanza, Michele

  • Author_Institution
    REVEAL® Fac. of Inf., Univ. of Lugano, Lugano, Switzerland
  • fYear
    2010
  • fDate
    June 30 2010-July 2 2010
  • Firstpage
    24
  • Lastpage
    33
  • Abstract
    E-mails, used by developers and system users to communicate over a broad range of topics, offer a valuable source of information. If archived, e-mails can be mined to support program comprehension activities and to provide views of a software system that are alternative and complementary to those offered by the source code. However, e-mails are written in natural language, and therefore contain noise that makes it difficult to retrieve the important data. Thus, before conducting an effective system analysis and extracting data for program comprehension, it is necessary to select the relevant messages, and to expose only the meaningful information. In this work we focus both on classifying e-mails that hold fragments of the source code of a system, and on extracting the source code pieces inside the e-mail. We devised and analyzed a number of lightweight techniques to accomplish these tasks. To assess the validity of our techniques, we manually inspected and annotated a statistically significant number of e-mails from five unrelated open source software systems written in Java. With such a benchmark in place, we measured the effectiveness of each technique in terms of precision and recall.
  • Keywords
    Java; electronic mail; information retrieval; public domain software; source coding; Java; data retrieval; e-mail classification; information ource; lightweight techniques; natural language; open source software systems; program comprehension; source code extraction; system analysis; Data analysis; Data mining; Electronic mail; Information analysis; Information resources; Information retrieval; Java; Natural languages; Open source software; Software systems; e-mail archives; source code extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Program Comprehension (ICPC), 2010 IEEE 18th International Conference on
  • Conference_Location
    Braga, Minho
  • ISSN
    1092-8138
  • Print_ISBN
    978-1-4244-7604-6
  • Electronic_ISBN
    1092-8138
  • Type

    conf

  • DOI
    10.1109/ICPC.2010.47
  • Filename
    5521781