Abstract :
Partial least-squares (PLS) regression is a very widely used technique in spectroscopy for calibration/prediction purposes. One of the most important steps in the application of the PLS regression is the determination of the correct number of dimensions to use in order to avoid over-fitting, and therefore to obtain a robust predictive model. The “structured” nature of spectroscopic signals may be used in several ways as a guide to improve the PLS models. The aim of this work is to propose a new technique for the application of PLS regression to signals (FT-IR, NMR, etc.). This technique is based on the Savitsky–Golay (SG) smoothing of the loadings weights vectors (w) obtained at each iteration step of the NIPALS procedure. This smoothing progressively “displaces” the random or quasi-random variations from earlier (most important) to later (less important) PLS latent variables. The Durbin–Watson (DW) criterion is calculated for each PLS vectors (p, w, b) at each iteration step of the smoothed NIPALS procedure in order to measure the evolution of their “noise” content. PoLiSh has been applied to simulated datasets with different noise levels and it was found that for those with noise levels higher than 10–20%, an improvement in the predictive ability of the models is observed. This technique is also important as a tool to evaluate the true dimensionality of signal matrices for complex PLS models, by comparing the DW profile of the PoLiSh vectors at different smoothing degrees with those of the unsmoothed PLS models.
Keywords :
Durbin–Watson , partial least-squares regression , Dimensionality , Savitsky–Golay smoothing