• DocumentCode
    2731934
  • Title

    GA-facilitated KNN classifier optimization with varying similarity measures

  • Author

    Peterson, Michael R. ; Doom, Travis E. ; Raymer, Michael L.

  • Author_Institution
    Dept. of Comput. Sci. & Eng.,, Wright State Univ., Dayton, OH, USA
  • Volume
    3
  • fYear
    2005
  • fDate
    2-5 Sept. 2005
  • Firstpage
    2514
  • Abstract
    Genetic algorithms are powerful tools for k-nearest neighbors classifier optimization. While traditional knn classification techniques typically employ Euclidian distance to assess pattern similarity, other measures may also be utilized. Previous research demonstrates that GAs can improve predictive accuracy by searching for optimal feature weights and offsets for a cosine similarity-based knn classifier. GA-selected weights determine the classification relevance of each feature, while offsets provide alternative points of reference when assessing angular similarity. Such optimized classifiers perform competitively with other contemporary classification techniques. This paper explores the effectiveness of GA weight and offset optimization for knowledge discovery using knn classifiers with varying similarity measures. Using Euclidian distance, cosine similarity, and Pearson correlation, untrained classifiers are compared with weight-optimized classifiers for several datasets. Simultaneous weight and offset optimization experiments are also performed for cosine similarity and Pearson correlation. This type of optimization represents a novel technique for maximizing Pearson correlation-based knn performance. While unoptimized cosine and Pearson classifiers often perform worse than their Euclidian counterparts, optimized cosine and Pearson classifiers typically show equivalent or improved performance over optimized Euclidian classifiers. In some cases, offset optimization provides further improvement for knn classifiers employing cosine similarity or Pearson correlation.
  • Keywords
    data mining; genetic algorithms; pattern classification; Euclidian distance; GA facilitated KNN classifier optimization; Pearson correlation; classification relevance; cosine similarity; genetic algorithms; k-nearest neighbors; knowledge discovery; pattern similarity; varying similarity measures; Accuracy; Biological cells; Data analysis; Feature extraction; Gene expression; Genetic algorithms; Genetic mutations; Pattern recognition; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2005. The 2005 IEEE Congress on
  • Print_ISBN
    0-7803-9363-5
  • Type

    conf

  • DOI
    10.1109/CEC.2005.1555009
  • Filename
    1555009