Title of article :
Non-linear methods for multivariate statistical calibration and their use in palaeoecology: a comparison of inverse (k-nearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches
Author/Authors :
ter Braak، نويسنده , , Cajo J.F.، نويسنده ,
Abstract :
Current environmental problems, such as acid rain and global warming, have greatly increased interest in fossil species assemblages as indicators of the palaeoenvironment and thus in quantitative methods for reconstructing environmental variables from species assemblage data. The ensuing multivariate calibration problem appears to be even harder than that of spectroscopic calibration, primarily because the basic model is unimodal (Shelfordʹs law of tolerance) instead of being linear (Beerʹs law). The strong non-linearity has led to the use of non-parametric calibration methods, in particular the smooth response surface method (SRS) and the method of best modern analogues, alias k-nearest neighbours (k-NN), and to a form of non-linear partial least squares (PLS), called weighted averaging partial least squares (WA-PLS), specially designed to analyze unimodal data. SRS and k-NN are recognized as non-parametric smoothing versions of the classical and inverse approach to linear calibration, respectively, whereas PLS and WA-PLS are inverse methods that bring in the aspect of dimension reduction. In a comparison on ‘realistically looking’ simulated compositional data with 100 training samples and 500 independent evaluation samples, WA-PLS and k-NN outperformed PLS when the species response functions were unimodal. For such data, k-NN resisted the curse of dimensionality. However, when the response functions were near-linear, WA-PLS and PLS performed about equally and clearly outperformed k-NN. On other simulated data, simultaneous calibration of two climate variables via a parametric non-linear classical method was compared with individual calibrations via inverse methods. The simultaneous calibration method was better at the border of the sampled space than the best inverse method (WA-PLS) and much better than k-NN. The simulations demonstrated the limitations of the leave-one-out estimate of prediction error: it showed severe method-dependent bias.