• DocumentCode
    2923642
  • Title

    Mining data with numerical attributes and missing attribute values — A rough set approach

  • Author

    Grzymala-Busse, Jerzy W. ; Hippe, Zdzislaw S.

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
  • fYear
    2011
  • fDate
    8-10 Nov. 2011
  • Firstpage
    214
  • Lastpage
    219
  • Abstract
    This paper discusses a challenging problem of mining data sets with numerical attributes and, at the same time, with missing attribute values. We distinguish between two interpretations of missing attribute values: lost values and ”do not care” conditions. In our experiments, we used the LERS data mining system, inducing certain and possible rule sets, using rough set theory ideas of lower and upper approximations, respectively. The LERS data mining system has two options for computing approximations: global and local. In our experiments we used both options. Additionally, we used a probabilistic approach to missing attribute values, one of the most successful traditional methods to handle missing attribute values. Using the Wilcoxon matched-pairs signed rank test (5% level of significance for two-tailed test), we observed that the probabilistic approach was either worse or not better than rough set approaches.
  • Keywords
    approximation theory; data mining; probability; rough set theory; LERS data mining system; Wilcoxon matched-pair signed rank test; computing approximation; data set mining; missing attribute value; numerical attribute; probabilistic approach; rough set approach; rough set theory; rule sets; Approximation methods; Data mining; Error analysis; Hypertension; Iris; Obesity; Probabilistic logic; Data mining; conditions; incomplete data; lost values “do not care”; rough set theory; rule induction algorithm MLEM2;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Granular Computing (GrC), 2011 IEEE International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-1-4577-0372-0
  • Type

    conf

  • DOI
    10.1109/GRC.2011.6122596
  • Filename
    6122596