• DocumentCode
    1625102
  • Title

    A preprocessing of outlier using KERNEL PCA and factor scores in regression model

  • Author

    Oh, Kyung-Whan ; Jun, Sunghae ; Kim, Yong-Jun

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul, South Korea
  • fYear
    2009
  • Firstpage
    2132
  • Lastpage
    2135
  • Abstract
    Data analysis including outlier is more difficult to the analysis without outlier. The outlier has a chance to increase the misclassification rate and the variance of estimate in the supervised learning like classification and regression. Also the outlier becomes a cluster in the clustering as unsupervised learning. So we are hard to represent the clustering result. Because of the previous problems, it is removed generally for constructing model in data mining. But when the outlier has some information on given data, we must not remove it from training data set. In this paper, using kernel PCA (principal component analysis) and factor scores, we propose a preprocessing method to contain the outlier in the modeling. The outlier effect of given training data set is reduced by the values of kernel PCA and factor scores. We verify improved performance of our work by the experimental results using simulation data sets in regression model.
  • Keywords
    data analysis; data mining; estimation theory; pattern classification; pattern clustering; principal component analysis; regression analysis; unsupervised learning; clustering method; data analysis; data mining; factor score; kernel PCA; misclassification rate; outlier preprocessing method; principal component analysis; regression model; simulation data set; supervised learning; training data set; unsupervised learning; variance estimate; Computer science; Data mining; Intrusion detection; Kernel; Principal component analysis; Statistical analysis; Supervised learning; Testing; Training data; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on
  • Conference_Location
    Jeju Island
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4244-3596-8
  • Electronic_ISBN
    1098-7584
  • Type

    conf

  • DOI
    10.1109/FUZZY.2009.5277180
  • Filename
    5277180