DocumentCode :
3603250
Title :
Sample Weighting: An Inherent Approach for Outlier Suppressing Discriminant Analysis
Author :
Chuan-Xian Ren ; Dao-Qing Dai ; Xiaofei He ; Hong Yan
Author_Institution :
Intell. Data Center (IDC) & Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Volume :
27
Issue :
11
fYear :
2015
Firstpage :
3070
Lastpage :
3083
Abstract :
As the data acquirement technologies develop rapidly, both the amount and types of data become larger and larger. However, noise and outliers usually attach to the data and then affect the real performance of leaning algorithms in data mining and pattern analysis. To address this problem, the importance of the sample itself in building the optimal subspace is explored, and then an importance-sampling-inspired method is proposed for outlier suppressing feature extraction. First, we assign each sample a weight, which is estimated by graph Laplacian, and then calculate the approximated mean for each subject. By highlighting the most subject-oriented samples, the weighted average and the scatter metrics can be measured with maximum margins and superior classification performance. The supervised information integrates local data structure with respective contributions to building the optimal subspace. The linear criterion can be extended to a nonlinear case by the kernel trick. A regularization framework is proposed to deal with the rank-deficient problem, which is usually induced by the small sample size of training set. Competitive performance of our algorithm has been validated by extensive experiments performed on the synthetic and benchmark data, including facial images and gene micro-array data.
Keywords :
approximation theory; data mining; data structures; feature extraction; graph theory; pattern classification; approximated mean; classification performance; data acquirement technologies; data mining; discriminant analysis; feature extraction; graph Laplacian; importance-sampling-inspired method; kernel trick; linear criterion; local data structure; maximum margins; outlier; pattern analysis; rank-deficient problem; regularization framework; sample weighting; scatter metrics; weighted average; Buildings; Covariance matrices; Estimation; Kernel; Optical wavelength conversion; Support vector machines; Training; Discriminant analysis; Feature extraction; Importance sampling; Regularization; Sample weighting; feature extraction; importance sampling; regularization; sample weighting;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2015.2448547
Filename :
7130630
Link To Document :
بازگشت