• DocumentCode
    2752798
  • Title

    A preliminary study on missing data imputation in evolutionary fuzzy systems of subgroup discovery

  • Author

    Carmona, C.J. ; Luengo, J. ; González, P. ; Jesus, M. J del

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Jaen, Jaen, Spain
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    In real-life data, a loss of information is frequent in data mining due to the presence of missing values in the attributes. Missing values can occur due to problems in the manual data entry procedures, equipment errors or incorrect measurements. The presence of missing values in attributes conditions the results obtained by any knowledge extraction approach. Specifically, this problem could lead in subgroup discovery to a loss of quality of results obtained by subgroups on measures such as sensitivity, confidence, significance or unusualness. This paper presents an experimental study to analyse the effect of different missing data imputation mechanisms within subgroup discovery algorithms based on evolutionary fuzzy systems presented throughout the literature. The analysis is carried out with a large number of data sets obtained from KEEL repository. Among all the imputation techniques, the imputation method K-Nearest Neighbour outstands as the best option. In summary, if experts need to analyse a problem with a high percentage of missing values they must use this imputation method in order to treat data in a correct way and also to obtain a meaningful descriptive knowledge. In addition, results also show that the evolutionary fuzzy system with the best results is the algorithm NMEEF-SD in the missing values scenario.
  • Keywords
    data analysis; data mining; evolutionary computation; fuzzy set theory; fuzzy systems; pattern classification; KEEL repository; NMEEF-SD algorithm; data mining; data sets; equipment errors; evolutionary fuzzy systems; incorrect measurements; k-nearest neighbour; knowledge extraction; manual data entry procedures; missing data imputation mechanisms; missing values; subgroup discovery algorithms; Algorithm design and analysis; Argon; Data mining; Educational institutions; Fuzzy systems; Guidelines; Sensitivity; Evolutionary Fuzzy System; Missing Data Imputation; Subgroup Discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems (FUZZ-IEEE), 2012 IEEE International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4673-1507-4
  • Electronic_ISBN
    1098-7584
  • Type

    conf

  • DOI
    10.1109/FUZZ-IEEE.2012.6251182
  • Filename
    6251182