شماره ركورد كنفرانس :
5318
عنوان مقاله :
Enhanced Data Point Importance for Efficient Data Splitting in Classification Models: Application to Olive Oil Authentication
پديدآورندگان :
Zare Zahra Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran , Vali Zade Somaye Halal Research Center of IRI, Food and Drug Administration, Ministry of Health and Medical Education, Tehran, Iran , Abdollahi Hamid abd@iasbs.ac.ir Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
تعداد صفحه :
1
كليدواژه :
Essential points (EP) , Data Point Importance (DPI) , Enhanced DPI (EDPI).
سال انتشار :
1402
عنوان كنفرانس :
نهمين سمينار ملي دوسالانه كمومتريكس ايران
زبان مدرك :
انگليسي
چكيده فارسي :
In the realm of data science, classification models are vital for predicting or identifying classes within datasets. The success of creating a classification model hinges on accurately selecting samples for both the training and testing datasets. Proper data splitting during data preprocessing directly influences the effectiveness and efficiency of the final classification model. In PCA space, data points can be categorized into essential and non-essential points. Essential points (EP) represent convex hull vertices built from data points in a normalized space, forming a representative set of data. On the other hand, non-essential points are located inside the convex hull. Recently, an algorithm called Data Point Importance (DPI) has been introduced [1] to determine the order of importance of EPs, enabling the sorting of information and selection of samples within the dataset. DPI provides an easily calculable value that reflects the impact of each data point on preserving the data structure s pattern. This research extends the concept of DPI to non-essential points, establishing the sequence of importance for all data points and sorting information for each of them. The study evaluates the idea of Enhanced DPI (EDPI) and its application in selecting important points that affect the efficiency of class modeling. In the proposed algorithm, data points are examined in the form of layered convex hulls, and their order of importance is evaluated. EDPI is used to rank all data points (samples) in the row space of the training set of the target class based on their significance in preserving the data structure. The approach is applied in class modeling (DD-SIMCA) for authenticating extra virgin olive oil samples. The research also compares the results obtained from sample selection using the EDPI strategy with the Kennard-Stone method (KS). The study utilizes Raman spectra of pure samples and samples adulterated with various oils to develop one-class models for evaluating the authenticity and adulteration of extra virgin olive oil. Figure 1 illustrates the ranking results of data points (samples) based on the Enhanced DPI strategy, demonstrating that the proposed method for data splitting outperforms the KS method in many cases [1].
كشور :
ايران
لينک به اين مدرک :
بازگشت