DocumentCode :
2747341
Title :
Reducing overfitting in genetic programming models for software quality classification
Author :
Liu, Yi ; Khoshgoftaar, Taghi
Author_Institution :
Math. & Comput. Sci. Dept., Georgia Coll. & State Univ., Milledgeville, GA, USA
fYear :
2004
fDate :
25-26 March 2004
Firstpage :
56
Lastpage :
65
Abstract :
A high-assurance system is largely dependent on the quality of its underlying software. Software quality models can provide timely estimations of software quality, allowing the detection and correction of faults prior to operations. A software metrics-based quality prediction model may depict overfitting, which occurs when a prediction model has good accuracy on the training data but relatively poor accuracy on the test data. We present an approach to address the overfitting problem in the context of software quality classification models based on genetic programming (GP). The problem has not been addressed in depth for GP-based models. The presence of overfitting in a software quality classification model affects its practical usefulness, because management is interested in good performance of the model when applied to unseen software modules, i.e., generalization performance. In the process of building GP-based software quality classification models for a high-assurance telecommunications system, we observed that the GP models were prone to overfitting. We utilize a random sampling technique to reduce overfitting in our GP models. The approach has been found by many researchers as an effective method for reducing the time of a GP run. However, in our study we utilize random to reduce overfitting with the aim of improving the generalization capability of our GP models.
Keywords :
classification; fault tolerant computing; genetic algorithms; software metrics; software quality; telecommunication; GP models; fault correction; fault detection; genetic programming; high-assurance system; high-assurance telecommunication system; overfitting problem; software modules; software quality classification models; Context modeling; Fault detection; Genetic programming; Predictive models; Quality management; Sampling methods; Software performance; Software quality; Software testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Assurance Systems Engineering, 2004. Proceedings. Eighth IEEE International Symposium on
ISSN :
1530-2059
Print_ISBN :
0-7695-2094-4
Type :
conf
DOI :
10.1109/HASE.2004.1281730
Filename :
1281730
Link To Document :
بازگشت