• DocumentCode
    911320
  • Title

    Evolutionary Sampling and Software Quality Modeling of High-Assurance Systems

  • Author

    Drown, Dennis J. ; Khoshgoftaar, Taghi M. ; Seliya, Naeem

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • Volume
    39
  • Issue
    5
  • fYear
    2009
  • Firstpage
    1097
  • Lastpage
    1107
  • Abstract
    Software quality modeling for high-assurance systems, such as safety-critical systems, is adversely affected by the skewed distribution of fault-prone program modules. This sparsity of defect occurrence within the software system impedes training and performance of software quality estimation models. Data sampling approaches presented in data mining and machine learning literature can be used to address the imbalance problem. We present a novel genetic algorithm-based data sampling method, named evolutionary sampling, as a solution to improving software quality modeling for high-assurance systems. The proposed solution is compared with multiple existing data sampling techniques, including random undersampling, one-sided selection, Wilson´s editing, random oversampling, cluster-based oversampling, synthetic minority oversampling technique (SMOTE), and borderline-SMOTE. This paper involves case studies of two real-world software systems and builds C4.5- and RIPPER-based software quality models both before and after applying a given data sampling technique. It is empirically shown that evolutionary sampling improves performance of software quality models for high-assurance systems and is significantly better than most existing data sampling techniques.
  • Keywords
    genetic algorithms; random processes; safety-critical software; sampling methods; software quality; Borderline-SMOTE; C4.5; RIPPER; Wilson editing; cluster-based oversampling; data mining; evolutionary sampling; genetic algorithm-based data sampling; high-assurance system; machine learning; one-sided selection; random oversampling; random undersampling; safety-critical system; software quality; synthetic minority oversampling technique; Data sampling; evolutionary computing; high-assurance system; imbalanced data; software metrics;
  • fLanguage
    English
  • Journal_Title
    Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4427
  • Type

    jour

  • DOI
    10.1109/TSMCA.2009.2020804
  • Filename
    4967988