• DocumentCode
    3060883
  • Title

    Predicting Faults in High Assurance Software

  • Author

    Seliya, Naeem ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason

  • Author_Institution
    Comput. & Inf. Sci. Dept., Univ. of Michigan-Dearborn, Dearborn, MI, USA
  • fYear
    2010
  • fDate
    3-4 Nov. 2010
  • Firstpage
    26
  • Lastpage
    34
  • Abstract
    Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is relatively very small. The skewed data problem, also known as class imbalance, poses a unique challenge when training a software quality estimation model. However, practitioners and researchers often build defect prediction models without regard to the skewed data problem. In high assurance systems, the class imbalance problem must be addressed when building defect predictors. This study investigates the roughly balanced bagging (RBBag) algorithm for building software quality models with data sets that suffer from class imbalance. The algorithm combines bagging and data sampling into one technique. A case study of 15 software measurement data sets from different real-world high assurance systems is used in our investigation of the RBBag algorithm. Two commonly used classification algorithms in the software engineering domain, Naive Bayes and C4.5 decision tree, are combined with RBBag for building the software quality models. The results demonstrate that defect prediction models based on the RBBag algorithm significantly outperform models built without any bagging or data sampling. The RBBag algorithm provides the analyst with a tool for effectively addressing class imbalance when training defect predictors during high assurance software development.
  • Keywords
    Bayes methods; decision trees; software fault tolerance; software metrics; software quality; C4.5 decision tree; class imbalance problem; classification algorithm; defect data; defect prediction model; fault prediction; high assurance software systems; naive Bayes; not-fault-prone program modules; roughly balanced bagging; software defects; software engineering domain; software measurement; software quality estimation model; software quality models; Bagging; Data models; Prediction algorithms; Predictive models; Software; Software measurement; bagging; classification; data sampling; defect prediction; imbalanced data; software measurements;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Assurance Systems Engineering (HASE), 2010 IEEE 12th International Symposium on
  • Conference_Location
    San Jose, CA
  • ISSN
    1530-2059
  • Print_ISBN
    978-1-4244-9091-2
  • Electronic_ISBN
    1530-2059
  • Type

    conf

  • DOI
    10.1109/HASE.2010.29
  • Filename
    5634306