Title of article
Reducing over-optimism in variable selection by cross-model validation
Author/Authors
Anderssen، نويسنده , , Endre and Dyrstad، نويسنده , , Knut and Westad، نويسنده , , Frank and Martens، نويسنده , , Harald، نويسنده ,
Issue Information
دوفصلنامه با شماره پیاپی سال 2006
Pages
6
From page
69
To page
74
Abstract
Extensive optimisation of a mathematical modelʹs fit to a relatively small set of empirical data, may lead to over-optimistic validation results. If the assessment of the final, optimised model is based on the same validation method and the same input data that were used as basis for the extensive model optimisation, accumulated spurious correlations may appear as real predictive ability in the final model validation. An example of this is the use of extensive variable selection in multiple regression, based on a cross-model validation scheme.
ustrate the over-optimism problem in optimisation based on conventional one-layered validation, an artificial data set, with only random numbers was submitted to regression modelling. The model was optimised by stepwise variable selection. A very good apparent predictive ability for y from X was found in the final model by leave-one-out cross-validation (84%), after the number of X-variables had been reduced stepwise from 500 to 29. Finally, the performance of the cross-model validation is tested on one large QSAR data set. Several calibration sets were chosen randomly and a regression model optimised by variable selection. The prediction accuracy of these models was compared to the cross-validation and cross-model validation results. In these tests cross-model validation gives the better measure of model predictive ability.
Keywords
Jack-knifing , QSAR , Regression , Over-fitting , Cross-model validation , variable selection
Journal title
Chemometrics and Intelligent Laboratory Systems
Serial Year
2006
Journal title
Chemometrics and Intelligent Laboratory Systems
Record number
1461732
Link To Document