DocumentCode :
1549617
Title :
Comparing software prediction techniques using simulation
Author :
Shepperd, Martin ; Kadoda, Gada
Author_Institution :
Sch. of Design, Eng. & Comput., Bournemouth Univ., Poole, UK
Volume :
27
Issue :
11
fYear :
2001
fDate :
11/1/2001 12:00:00 AM
Firstpage :
1014
Lastpage :
1022
Abstract :
The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system
Keywords :
case-based reasoning; learning (artificial intelligence); neural nets; software metrics; virtual machines; case-based reasoning; data set characteristics; machine learning; nearest neighbor; neural nets; prediction problem; regression; rule induction; simulation; small data sets; software prediction systems; software prediction technique comparison; training set; Accuracy; Control systems; Data engineering; Helium; Machine learning; Nearest neighbor searches; Neural networks; Predictive models; Software systems; Uncertainty;
fLanguage :
English
Journal_Title :
Software Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
0098-5589
Type :
jour
DOI :
10.1109/32.965341
Filename :
965341
Link To Document :
بازگشت