• DocumentCode
    1974557
  • Title

    Tracking concept drift of software projects using defect prediction quality

  • Author

    Ekanayake, Jayalath ; Tappolet, Jonas ; Gall, Harald C. ; Bernstein, Abraham

  • Author_Institution
    Dynamic & Distrib. Syst. Group, Univ. of Zurich, Zurich
  • fYear
    2009
  • fDate
    16-17 May 2009
  • Firstpage
    51
  • Lastpage
    60
  • Abstract
    Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies strongly within and across software projects. In this paper we investigate the reasons why the prediction quality is so fluctuating due to the altering nature of the bug (or defect) fixing process. Therefore, we adopt the notion of a concept drift, which denotes that the defect prediction model has become unsuitable as set of influencing features has changed - usually due to a change in the underlying bug generation process (i.e., the concept). We explore four open source projects (Eclipse, OpenOffice, Netbeans and Mozilla) and construct file-level and project-level features for each of them from their respective CVS and Bugzilla repositories. We then use this data to build defect prediction models and visualize the prediction quality along the time axis. These visualizations allow us to identify concept drifts and - as a consequence - phases of stability and instability expressed in the level of defect prediction quality. Further, we identify those project features, which are influencing the defect prediction quality using both a tree induction-algorithm and a linear regression model. Our experiments uncover that software systems are subject to considerable concept drifts in their evolution history. Specifically, we observe that the change in number of authors editing a file and the number of defects fixed by them contribute to a project´s concept drift and therefore influence the defect prediction quality. Our findings suggest that project managers using defect prediction models for decision making should be aware of the actual phase of stability or instability due to a potential concept drift.
  • Keywords
    data mining; decision making; decision trees; program debugging; program testing; project management; regression analysis; software maintenance; Bugzilla repository; CVS; Eclipse; Mozilla; Netbeans; OpenOffice; bug generation process; concept drift tracking; decision making; decision tree induction-algorithm; defect prediction quality visualization; linear regression model; open source project; software bug fixing process; software project management; software repository mining; software system evolution history; software testing; Computer architecture; Computer bugs; Data visualization; History; Insulation life; Prediction algorithms; Predictive models; Software quality; Software systems; Stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories, 2009. MSR '09. 6th IEEE International Working Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4244-3493-0
  • Type

    conf

  • DOI
    10.1109/MSR.2009.5069480
  • Filename
    5069480