• DocumentCode
    2131023
  • Title

    Genetic Algorithm and Data Mining Techniques for Design Selection in Databases

  • Author

    Koukouvinos, Christos ; Parpoula, Christina ; Simos, Dimitris E.

  • Author_Institution
    Dept. of Math., Nat. Tech. Univ. of Athens, Athens, Greece
  • fYear
    2013
  • fDate
    2-6 Sept. 2013
  • Firstpage
    743
  • Lastpage
    746
  • Abstract
    Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.
  • Keywords
    data mining; design of experiments; genetic algorithms; health care; medical information systems; pattern classification; regression analysis; support vector machines; very large databases; association rule mining; data classification; data mining techniques; data prediction; design selection; design-of-experiments; genetic algorithm; health data; large databases; large dimensional statistical modelling problems; metaheuristic algorithms; regression problems; support vector machines; variable selection; Algorithm design and analysis; Association rules; Databases; Genetic algorithms; Input variables; Support vector machines; association rule mining; design of experiments; feature selection; large dimensional data; metaheuristics; sensitivity analysis; support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Availability, Reliability and Security (ARES), 2013 Eighth International Conference on
  • Conference_Location
    Regensburg
  • Type

    conf

  • DOI
    10.1109/ARES.2013.98
  • Filename
    6657314