• DocumentCode
    2910043
  • Title

    Feature selection and classification in bioscience/medical datasets: study of parameters and multi-objective approach in Two-Phase EA/k-NN method

  • Author

    Dissanayake, Manjula SB ; Corne, David W.

  • Author_Institution
    Dept. of Comput. Sci., Heriot-Watt Univ., Edinburgh, UK
  • fYear
    2010
  • fDate
    8-10 Sept. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Feature selection continues to grow in importance in many areas of science and engineering, as large datasets become increasingly common. In particular, bioscience and medical datasets routinely contain several thousands of features. For effective data mining in such datasets, tools are required that can reliably distinguish the most relevant features. The latter is a useful goal in itself (e.g. such features may be putative drug targets), and also improves (perhaps drastically) both the speed of machine learning algorithms on the dataset, and the quality of predictive models. Among much research in feature selection methods, previous work has shown promise for an evolutionary algorithm/classifier combination (EA/k-NN), which, in successive phases of the same algorithm, serves first as the feature selection mechanism and second as the machine learning method yielding an accurate classifier. Here, we follow up that work by investigating the configuration and parametrisation of the two phases, including an investigation of multi-objective approaches for one or both phases. Following tests on three datasets, we find: further evidence that the two-phase approach is effective, with results on the most difficult dataset highly competitive with the literature; inconclusive results concerning the ideal way to configure the two phases; evidence in support of using a multi-objective approach in one or both phases.
  • Keywords
    data mining; evolutionary computation; learning (artificial intelligence); medical computing; pattern classification; bioscience dataset; data mining; evolutionary algorithm; feature classification; feature selection; k-nearest neighbor classifier; machine learning; medical datasets; Accuracy; Artificial neural networks; Biological cells; Cancer; DNA; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence (UKCI), 2010 UK Workshop on
  • Conference_Location
    Colchester
  • Print_ISBN
    978-1-4244-8774-5
  • Electronic_ISBN
    978-1-4244-8773-8
  • Type

    conf

  • DOI
    10.1109/UKCI.2010.5625581
  • Filename
    5625581