• DocumentCode
    1974453
  • Title

    The promises and perils of mining git

  • Author

    Bird, Christian ; Rigby, Peter C. ; Barr, Earl T. ; Hamilton, David J. ; German, Daniel M. ; Devanbu, Prem

  • Author_Institution
    Univ. of California, Davis, CA
  • fYear
    2009
  • fDate
    16-17 May 2009
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as ldquoHow do contributions flow between developers to the official project repository?rdquo However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data.
  • Keywords
    data analysis; data mining; groupware; automatically recorded contributor attribution; central repository; collaboration; data analysis; decentralization; decentralized source code management systems; mining git; Birds; Collaborative work; Data analysis; History; Kernel; Linux; Mercury (metals); Open source software; Packaging; Rails;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories, 2009. MSR '09. 6th IEEE International Working Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4244-3493-0
  • Type

    conf

  • DOI
    10.1109/MSR.2009.5069475
  • Filename
    5069475