شماره ركورد كنفرانس :
5318
عنوان مقاله :
Enhanced Data Point Importance for Subset Selection in Partial Least Square Regression: A Comparative Study with Kennard-Stone Method
پديدآورندگان :
Vazifeh Solout Mahya Chemistry Faculty School of Sciences University of Tehran, Tehran POB 14155-6455, Iran , Vali Zade Somaye Halal Research Center of IRI, Food and Drug Administration, Ministry of Health and Medical Education, Tehran, Iran , Abdollahi Hamid Faculty of Chemistry, Institute for Advanced Studies in Basic Sciences, 45195-1159, Zanjan, Iran , Ghasemi Jahan Bakhsh Chemistry Faculty School of Sciences University of Tehran, Tehran POB 14155-6455, Iran
تعداد صفحه :
1
سال انتشار :
1402
عنوان كنفرانس :
نهمين سمينار ملي دوسالانه كمومتريكس ايران
زبان مدرك :
انگليسي
چكيده فارسي :
Upon the application of multivariate analysis to a dataset, whether involving singular block data (PCA, MCR, SIMCA) or multi-block data (PCR, PLS), the process of choosing a subset of samples from the complete dataset becomes essential. This procedure is referred to as subset selection. A subset refers to a smaller, representative portion of the entire dataset that is used for the purpose of building, refining, or validating the model. The characteristics of the subset chosen within the calibration model depend on the specific goals and requirements of the calibration process. The subset should accurately represent the overall characteristics of the entire dataset. It should capture the various patterns, trends, and variations present in the data. So, the choice of subset within a calibration model is a critical step. We proposed a new method for subset selection based on data point importance (DPI) in partial least square regression. In PLS space, data points can be categorized into essential and nonessential points. Essential points (EP) represent convex hull vertices built from data points in a normalized space, forming a representative set of data. On the other hand, non-essential points are located inside the convex hull. Recently, an algorithm called Data Point Importance (DPI) has been introduced [1] to determine the order of importance of EPs, enabling the sorting of information and selection of samples within the dataset. DPI provides an easily calculable value that reflects the impact of each data point on preserving the data structure s pattern. This research extends the concept of DPI to non-essential points, establishing the sequence of importance for all data points and sorting information for each of them. The study evaluates the idea of Enhanced DPI (EDPI) and its application in selecting important points to subset selection in PLS regression. The algorithm we present involves analyzing data points through layered convex hulls, assessing their relative importance. The ranking of all data points (samples) in the training is accomplished using EDPI, which determines their relevance in maintaining the integrity of the data structure within the row space. The study also conducts a comparison between the outcomes achieved through sample selection using the EDPI strategy and those obtained via the Kennard-Stone method (KS). Figure 1 depicts the ranking outcomes of data points (samples) utilizing the Enhanced DPI strategy, showcasing the comparable performance of the proposed data splitting method compared to the KS approach for corn data.
كشور :
ايران
لينک به اين مدرک :
بازگشت